Weakly Supervised Video Scene Graph Generation via Natural Language Supervision Feb 21, 2025 Graph Generation Image Captioning
Code Code Available 1GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Feb 13, 2025 Cross-Modal Retrieval Image Captioning
Code Code Available 1Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models Feb 3, 2025 Adversarial Robustness Image Captioning
Code Code Available 1PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model Jan 21, 2025 Hallucination Image Captioning
Code Code Available 1LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport Jan 16, 2025 AudioCaps Audio captioning
Code Code Available 1RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Jan 13, 2025 Concept Alignment Image Captioning
Code Code Available 1Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Jan 5, 2025 Image Captioning Image to text
Code Code Available 1Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning Jan 1, 2025 cross-modal alignment Denoising
Code Code Available 1Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models Dec 18, 2024 document understanding Image Captioning
Code Code Available 1G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o Dec 18, 2024 Image Captioning Video Captioning
Code Code Available 1MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants Dec 17, 2024 Image Captioning Question Answering
Code Code Available 1Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning Dec 11, 2024 Attribute Benchmarking
Code Code Available 1LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation Nov 25, 2024 Image Captioning RAG
Code Code Available 1FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation Nov 23, 2024 Anatomy Image Captioning
Code Code Available 1LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression Nov 20, 2024 Image Captioning Image Compression
Code Code Available 1Nearest Neighbor Normalization Improves Multimodal Retrieval Oct 31, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning Oct 23, 2024 Image Captioning Instruction Following
Code Code Available 1IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 1Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model Sep 20, 2024 Image Captioning Panoptic Segmentation
Code Code Available 1YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Sep 20, 2024 Benchmarking Image Captioning
Code Code Available 1LIME: Less Is More for MLLM Evaluation Sep 10, 2024 Image Captioning Question Answering
Code Code Available 1MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models Aug 30, 2024 Image Captioning Language Modeling
Code Code Available 1See or Guess: Counterfactually Regularized Image Captioning Aug 29, 2024 Causal Inference counterfactual
Code Code Available 1Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Aug 26, 2024 Descriptive Image Captioning
Code Code Available 1BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Jul 29, 2024 Image Captioning
Code Code Available 1DiffX: Guide Your Layout to Cross-Modal Generative Modeling Jul 22, 2024 Denoising Image Captioning
Code Code Available 1AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning Jul 10, 2024 Audio-Visual Captioning Image Captioning
Code Code Available 1Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation Jul 10, 2024 Image Captioning Image Segmentation
Code Code Available 1MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment Jun 28, 2024 Answer Generation Image Captioning
Code Code Available 1MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models Jun 17, 2024 Benchmarking Fact Checking
Code Code Available 1ImageNet3D: Towards General-Purpose Object-Level 3D Understanding Jun 13, 2024 Image Captioning Linear Probing Object-Level 3D Awareness
Code Code Available 1FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Jun 10, 2024 Image Captioning
Code Code Available 1RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection May 30, 2024 Image Captioning Image Inpainting
Code Code Available 1UniRAG: Universal Retrieval Augmentation for Large Vision Language Models May 16, 2024 Image Captioning Image Generation
Code Code Available 1Boostlet.js: Image processing plugins for the web via JavaScript injection May 13, 2024 Data Visualization Image Captioning
Code Code Available 1LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? Apr 16, 2024 Image Captioning Image Generation
Code Code Available 1Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Apr 12, 2024 Image Captioning Question Answering
Code Code Available 1Harnessing the Power of Large Vision Language Models for Synthetic Image Detection Apr 3, 2024 Image Captioning Synthetic Image Detection
Code Code Available 1Bi-LORA: A Vision-Language Approach for Synthetic Image Detection Apr 2, 2024 Binary Classification Image Captioning
Code Code Available 1Disentangled Pre-training for Human-Object Interaction Detection Apr 2, 2024 Action Recognition Decoder
Code Code Available 1Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 1Can We Talk Models Into Seeing the World Differently? Mar 14, 2024 Image Captioning Image Classification
Code Code Available 1Differentially Private Representation Learning via Image Captioning Mar 4, 2024 Image Captioning Representation Learning
Code Code Available 1Polos: Multimodal Metric Learning from Human Feedback for Image Captioning Feb 28, 2024 Contrastive Learning Image Captioning
Code Code Available 1Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning Feb 21, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models Feb 17, 2024 Earth Observation Image Captioning
Code Code Available 1Text-Guided Image Clustering Feb 5, 2024 Clustering Image Captioning
Code Code Available 1SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval Jan 24, 2024 Benchmarking Image Captioning
Code Code Available 1Veagle: Advancements in Multimodal Representation Learning Jan 18, 2024 Image Captioning Language Modelling
Code Code Available 1Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training Jan 4, 2024 Descriptive Image Captioning
Code Code Available 1