ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 15 Bi-LORA: A Vision-Language Approach for Synthetic Image Detection Apr 2, 2024 Binary Classification Image Captioning
Code Code Available 15 FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context Mar 4, 2022 Decoder Image Captioning
Code Code Available 15 Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 15 It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection Apr 15, 2022 Image Captioning
Code Code Available 15 Can We Talk Models Into Seeing the World Differently? Mar 14, 2024 Image Captioning Image Classification
Code Code Available 15 Injecting Semantic Concepts into End-to-End Image Captioning Dec 9, 2021 Caption Generation Image Captioning
Code Code Available 15 LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation Nov 25, 2024 Image Captioning RAG
Code Code Available 15 Improving Image Captioning with Better Use of Captions Jun 21, 2020 Caption Generation Image Captioning
Code Code Available 15 Improving Image Captioning by Leveraging Intra- and Inter-layer Global Representation in Transformer Network Dec 13, 2020 Caption Generation Decoder
Code Code Available 15 In Defense of Grid Features for Visual Question Answering Jan 10, 2020 Image Captioning Question Answering
Code Code Available 15 Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models May 24, 2023 document understanding Image Captioning
Code Code Available 15 Contrastive Vision-Language Alignment Makes Efficient Instruction Learner Nov 29, 2023 Contrastive Learning Image Captioning
Code Code Available 15 ImageNet3D: Towards General-Purpose Object-Level 3D Understanding Jun 13, 2024 Image Captioning Linear Probing Object-Level 3D Awareness
Code Code Available 15 InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 15 LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Apr 16, 2024 Image Captioning Image Generation
Code Code Available 15 Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning Oct 4, 2021 Hallucination Image Captioning
Code Code Available 15 Concadia: Towards Image-Based Text Generation with a Purpose Apr 16, 2021 Image Captioning Image to text
Code Code Available 15 Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives Mar 18, 2025 Image Captioning
Code Code Available 15 Image Captioning In the Transformer Age Apr 15, 2022 Decoder Image Captioning
Code Code Available 15 I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision Nov 17, 2022 Image Captioning Question Answering
Code Code Available 15 IC3: Image Captioning by Committee Consensus Feb 2, 2023 Image Captioning
Code Code Available 15 Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models Dec 15, 2023 Image Captioning In-Context Learning
Code Code Available 15 Image Captioning through Image Transformer Apr 29, 2020 Image Captioning object-detection
Code Code Available 15 Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models May 15, 2023 3D Object Detection Image Captioning
Code Code Available 15 CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 15 COCO-Stuff: Thing and Stuff Classes in Context Dec 12, 2016 Image Captioning Semantic Segmentation
Code Code Available 15 G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Dec 18, 2024 Image Captioning Video Captioning
Code Code Available 15 How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? Nov 16, 2015 Image Captioning
Code Code Available 15 Human-like Controllable Image Captioning with Verb-specific Semantic Roles Mar 22, 2021 Caption Generation controllable image captioning
Code Code Available 15 A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models Oct 16, 2021 Image Captioning Language Modeling
Code Code Available 15 Compact Bidirectional Transformer for Image Captioning Jan 6, 2022 Decoder Image Captioning
Code Code Available 15 Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Feb 17, 2021 Caption Generation Diversity
Code Code Available 15 IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 15 Hard Non-Monotonic Attention for Character-Level Transduction Aug 29, 2018 Hard Attention Image Captioning
Code Code Available 15 Comprehensive Image Captioning via Scene Graph Decomposition Jul 23, 2020 Diversity Image Captioning
Code Code Available 15 CaMEL: Mean Teacher Learning for Image Captioning Feb 21, 2022 Image Captioning Knowledge Distillation
Code Code Available 15 Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift Dec 15, 2022 Benchmarking Image Captioning
Code Code Available 15 ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning Feb 11, 2022 Image Captioning Relation
Code Code Available 15 Can Audio Captions Be Evaluated with Image Caption Metrics? Oct 10, 2021 AudioCaps Audio captioning
Code Code Available 15 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Jul 29, 2024 Image Captioning
Code Code Available 15 Are scene graphs good enough to improve Image Captioning? Sep 25, 2020 Decoder Graph Attention
Code Code Available 15 Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 15 Connecting What to Say With Where to Look by Modeling Human Attention Traces May 12, 2021 Caption Generation Image Captioning
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation May 10, 2023 Benchmarking Image Captioning
Code Code Available 15 CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages Oct 20, 2023 Diversity GPU
Code Code Available 15 Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering Jun 16, 2023 Image Captioning Question Answering
Code Code Available 15 Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 15 Brain Captioning: Decoding human brain activity into images and text May 19, 2023 Brain Decoding Depth Estimation
Code Code Available 15