Scalable 3D Captioning with Pretrained Models Jun 12, 2023 Descriptive Image Captioning
Code Code Available 2ClipCap: CLIP Prefix for Image Captioning Nov 18, 2021 Image Captioning Language Modeling
Code Code Available 2Contextual Object Detection with Multimodal Large Language Models May 29, 2023 Cloze Test Decoder
Code Code Available 2ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions Mar 12, 2023 Image Captioning Question Answering
Code Code Available 2Text-Only Training for Image Captioning using Noise-Injected CLIP Nov 1, 2022 Decoder Image Captioning
Code Code Available 2PoseScript: Linking 3D Human Poses and Natural Language Oct 21, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction Mar 27, 2024 Image Captioning Language Modeling
Code Code Available 2Yo'LLaVA: Your Personalized Language and Vision Assistant Jun 13, 2024 Image Captioning Question Answering
Code Code Available 2Comprehending and Ordering Semantics for Image Captioning Jun 14, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 2CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Apr 4, 2024 Attribute Image Captioning
Code Code Available 2CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts May 9, 2024 Image Captioning Instruction Following
Code Code Available 2Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding Oct 7, 2022 Chart Question Answering Diversity
Code Code Available 2OmniCaptioner: One Captioner to Rule Them All Apr 9, 2025 All Image Captioning
Code Code Available 2OmniSearchSage: Multi-Task Multi-Entity Embeddings for Pinterest Search Apr 25, 2024 Entity Embeddings Image Captioning
Code Code Available 2Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP Oct 9, 2022 Image Captioning Open Vocabulary Semantic Segmentation
Code Code Available 2RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models Oct 17, 2024 Image Captioning Question Answering
Code Code Available 2Beyond Text: Frozen Large Language Models in Visual Signal Comprehension Mar 12, 2024 Deblurring Decoder
Code Code Available 2LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding Jun 29, 2023 16k Image Captioning
Code Code Available 2LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models Jun 15, 2023 Hallucination Image Captioning
Code Code Available 2Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts Feb 24, 2025 Benchmarking Fact Verification
Code Code Available 2JourneyDB: A Benchmark for Generative Image Understanding Jul 3, 2023 Image Captioning Image Comprehension
Code Code Available 2Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Mar 6, 2025 General Knowledge Image Captioning
Code Code Available 2Benchmarking and Improving Detail Image Caption May 29, 2024 Benchmarking Image Captioning
Code Code Available 2VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis Mar 29, 2024 Hallucination Image Captioning
Code Code Available 2Learning Vision from Models Rivals Learning Vision from Data Dec 28, 2023 Contrastive Learning Image Captioning
Code Code Available 2LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models Nov 28, 2023 Image Captioning Question Answering
Code Code Available 2Language Models Can See: Plugging Visual Controls in Text Generation May 5, 2022 Image Captioning Image-text matching
Code Code Available 2MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Oct 3, 2023 Chatbot Image Captioning
Code Code Available 2Frontiers in Intelligent Colonoscopy Oct 22, 2024 Image Captioning
Code Code Available 2GIT: A Generative Image-to-text Transformer for Vision and Language May 27, 2022 Decoder Image Captioning
Code Code Available 2From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models Oct 13, 2023 Hallucination Image Captioning
Code Code Available 2From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks Jun 4, 2024 Image Captioning Language Modelling
Code Code Available 2GLaMM: Pixel Grounding Large Multimodal Model Nov 6, 2023 Conversational Question Answering Image Captioning
Code Code Available 2Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks Apr 13, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 2EvalMuse-40K: A Reliable and Fine-Grained Benchmark with Comprehensive Human Annotations for Text-to-Image Generation Model Evaluation Dec 24, 2024 Image Captioning Image Generation
Code Code Available 2Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language Models Jun 3, 2024 Image Captioning Language Modelling
Code Code Available 2BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks May 26, 2023 Image Captioning Medical Visual Question Answering
Code Code Available 2Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions Aug 8, 2023 Caption Generation Image Captioning
Code Code Available 2Fine-grained Image Captioning with CLIP Reward May 26, 2022 Caption Generation Descriptive
Code Code Available 2MeaCap: Memory-Augmented Zero-shot Image Captioning Mar 6, 2024 Caption Generation Image Captioning
Code Code Available 2TIPS: Text-Image Pretraining with Spatial Awareness Oct 21, 2024 Depth Estimation Image Captioning
Code Code Available 2CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features May 13, 2019 Domain Generalization Image Captioning
Code Code Available 1Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model Sep 20, 2024 Image Captioning Panoptic Segmentation
Code Code Available 1Aesthetically Relevant Image Captioning Nov 25, 2022 Image Captioning Sentence
Code Code Available 1UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training Mar 6, 2023 Decoder Image Captioning
Code Code Available 1Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models May 24, 2023 document understanding Image Captioning
Code Code Available 1COSMic: A Coherence-Aware Generation Metric for Image Descriptions Sep 11, 2021 Caption Generation Image Captioning
Code Code Available 1Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO Apr 30, 2020 Image Captioning Representation Learning
Code Code Available 1Convolutional Image Captioning Nov 24, 2017 Image Captioning Text Generation
Code Code Available 1