Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Dec 15, 2023 Factual Inconsistency Detection in Chart Captioning Image Captioning
Code Code Available 1Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models Dec 15, 2023 Image Captioning In-Context Learning
Code Code Available 1VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation Dec 14, 2023 Image Captioning Image Generation
Code Code Available 1A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Dec 14, 2023 Image Captioning
Code Code Available 1Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 1Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos Dec 7, 2023 Diagnostic Image Captioning
Code Code Available 1Mitigating Open-Vocabulary Caption Hallucinations Dec 6, 2023 Diversity Hallucination
Code Code Available 1Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning Dec 2, 2023 Causal Language Modeling Contrastive Learning
Code Code Available 1Contrastive Vision-Language Alignment Makes Efficient Instruction Learner Nov 29, 2023 Contrastive Learning Image Captioning
Code Code Available 1Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models Nov 28, 2023 Image Captioning Image-text matching
Code Code Available 1Zero-shot audio captioning with audio-language model guidance and audio context keywords Nov 14, 2023 Audio captioning Descriptive
Code Code Available 1InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 1NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment Nov 5, 2023 Caption Generation Common Sense Reasoning
Code Code Available 1Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning Nov 2, 2023 Diagnostic Image Captioning
Code Code Available 1Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts Oct 31, 2023 Image Captioning Language Modeling
Code Code Available 1Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection Oct 29, 2023 Anomaly Detection Image Captioning
Code Code Available 1CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages Oct 20, 2023 Diversity GPU
Code Code Available 1Visual Grounding Helps Learn Word Meanings in Low-Data Regimes Oct 20, 2023 Image Captioning Language Acquisition
Code Code Available 1Sieve: Multimodal Dataset Pruning Using Image Captioning Models Oct 3, 2023 Diversity Image Captioning
Code Code Available 1Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation Sep 12, 2023 Image Captioning Image Generation
Code Code Available 1Exchanging-based Multimodal Fusion with Transformer Sep 5, 2023 Image Captioning Image Generation
Code Code Available 1CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Aug 29, 2023 Image Captioning Machine Translation
Code Code Available 1MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning Aug 25, 2023 Image Captioning Video Captioning
Code Code Available 1VIGC: Visual Instruction Generation and Correction Aug 24, 2023 Hallucination Image Captioning
Code Code Available 1With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning Aug 23, 2023 Decoder Image Captioning
Code Code Available 1CgT-GAN: CLIP-guided Text GAN for Image Captioning Aug 23, 2023 Image Captioning
Code Code Available 1VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Aug 18, 2023 Image Captioning Text Generation
Code Code Available 1Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection Aug 16, 2023 Image Captioning Language Modeling
Code Code Available 1GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text Aug 14, 2023 Drug Discovery Image Captioning
Code Code Available 1Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model Aug 2, 2023 Hallucination Image Captioning
Code Code Available 1Transferable Decoding with Visual Entities for Zero-Shot Image Captioning Jul 31, 2023 Caption Generation Hallucination
Code Code Available 1RSGPT: A Remote Sensing Vision Language Model and Benchmark Jul 28, 2023 Image Captioning Language Modeling
Code Code Available 1Image Captions are Natural Prompts for Text-to-Image Models Jul 17, 2023 Image Captioning Image Generation
Code Code Available 1Linear Alignment of Vision-language Models for Image Captioning Jul 10, 2023 Image Captioning Language Modelling
Code Code Available 1Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints Jul 7, 2023 Image Captioning Image Retrieval
Code Code Available 1Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023 Jun 28, 2023 Action Anticipation Image Captioning
Code Code Available 1What Makes ImageNet Look Unlike LAION Jun 27, 2023 counterfactual Image Captioning
Code Code Available 1Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering Jun 16, 2023 Image Captioning Question Answering
Code Code Available 1Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Jun 7, 2023 Diversity Image Captioning
Code Code Available 1Understanding and Mitigating Copying in Diffusion Models May 31, 2023 Image Captioning Memorization
Code Code Available 1Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models May 29, 2023 Image Captioning Image Classification
Code Code Available 1FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 1CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers May 27, 2023 Image Captioning Image Retrieval
Code Code Available 1FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing May 27, 2023 Graph Similarity Human Judgment Correlation
Code Code Available 1Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models May 24, 2023 document understanding Image Captioning
Code Code Available 1Exploring Diverse In-Context Configurations for Image Captioning May 24, 2023 Image Captioning In-Context Learning
Code Code Available 1Text encoders bottleneck compositionality in contrastive vision-language models May 24, 2023 Attribute Image Captioning
Code Code Available 1MemeCap: A Dataset for Captioning and Interpreting Memes May 23, 2023 Image Captioning Meme Captioning
Code Code Available 1What Makes for Good Visual Tokenizers for Large Language Models? May 20, 2023 Image Captioning Object Counting
Code Code Available 1