Order-Embeddings of Images and Language Nov 19, 2015 Cross-Modal Retrieval Image Captioning
Code Code Available 1How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? Nov 16, 2015 Image Captioning
Code Code Available 1A large annotated corpus for learning natural language inference Aug 21, 2015 Image Captioning Natural Language Inference
Code Code Available 1VQA: Visual Question Answering May 3, 2015 Image Captioning Multiple-choice
Code Code Available 1Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Feb 10, 2015 Caption Generation Image Captioning
Code Code Available 1CIDEr: Consensus-based Image Description Evaluation Nov 20, 2014 Action Recognition Attribute
Code Code Available 1Show and Tell: A Neural Image Caption Generator Nov 17, 2014 Image Captioning Image Retrieval with Multi-Modal Query
Code Code Available 1Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos Jul 16, 2025 Image Captioning Representation Learning
— Unverified 0Mask-aware Text-to-Image Retrieval: Referring Expression Segmentation Meets Cross-modal Retrieval Jun 28, 2025 Cross-Modal Retrieval Image Captioning
— Unverified 0HalLoc: Token-level Localization of Hallucinations for Vision Language Models Jun 12, 2025 Hallucination Image Captioning
Code Code Available 0A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning Jun 11, 2025 Decoder Image Captioning
— Unverified 0Better Reasoning with Less Data: Enhancing VLMs Through Unified Modality Scoring Jun 10, 2025 Image Captioning
— Unverified 0Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings Jun 10, 2025 Image Captioning
Code Code Available 0Edit Flows: Flow Matching with Edit Operations Jun 10, 2025 Code Generation Image Captioning
— Unverified 0An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models Jun 10, 2025 Action Generation Image Captioning
— Unverified 0GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition Jun 9, 2025 Image Captioning
Code Code Available 0Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning Jun 8, 2025 Attribute Hallucination
— Unverified 0Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation Jun 7, 2025 Camouflaged Object Segmentation Feature Correlation
Code Code Available 0SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs Jun 5, 2025 backdoor defense Image Captioning
— Unverified 0Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation Jun 3, 2025 Caption Generation Image Captioning
— Unverified 0Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models May 30, 2025 Image Captioning Question Answering
— Unverified 0Beam-Guided Knowledge Replay for Knowledge-Rich Image Captioning using Vision-Language Model May 29, 2025 Image Captioning Language Modeling
— Unverified 0CLDTracker: A Comprehensive Language Description for Visual Tracking May 29, 2025 Image Captioning Visual Tracking
Code Code Available 0Document-Level Text Generation with Minimum Bayes Risk Decoding using Optimal Transport May 29, 2025 Document Level Machine Translation Image Captioning
Code Code Available 0Correlating instruction-tuning (in multimodal models) with vision-language processing (in the brain) May 26, 2025 Image Captioning
Code Code Available 0TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP May 24, 2025 Image Captioning Image Generation
— Unverified 0Redemption Score: An Evaluation Framework to Rank Image Captions While Redeeming Image Semantics and Language Pragmatics May 22, 2025 Image Captioning text similarity
— Unverified 0Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation May 22, 2025 Hallucination Image Captioning
— Unverified 0SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval May 21, 2025 counterfactual Graph Generation
Code Code Available 0NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI May 20, 2025 Anomaly Localization Benchmarking
— Unverified 0MedBLIP: Fine-tuning BLIP for Medical Image Captioning May 20, 2025 Decoder Image Captioning
— Unverified 0Aligning Attention Distribution to Information Flow for Hallucination Mitigation in Large Vision-Language Models May 20, 2025 Hallucination Image Captioning
— Unverified 0RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding May 20, 2025 Image Captioning Question Answering
Code Code Available 0Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping May 19, 2025 Contrastive Learning Cross-Modal Retrieval
— Unverified 0Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models May 16, 2025 Image Captioning Question Answering
Code Code Available 0Cross-Image Contrastive Decoding: Precise, Lossless Suppression of Language Priors in Large Vision-Language Models May 15, 2025 Image Captioning Language Modeling
— Unverified 0Describe Anything in Medical Images May 9, 2025 Attribute Diagnostic
— Unverified 0ArtRAG: Retrieval-Augmented Generation with Structured Context for Visual Art Understanding May 9, 2025 Image Captioning Object Recognition
— Unverified 0A Grounded Memory System For Smart Personal Assistants May 9, 2025 Entity Disambiguation Image Captioning
— Unverified 0Mitigating Image Captioning Hallucinations in Vision-Language Models May 6, 2025 Hallucination Hallucination Evaluation
— Unverified 0Compositional Image-Text Matching and Retrieval by Grounding Entities May 4, 2025 Image Captioning Image-text matching
Code Code Available 0Transferable Adversarial Attacks on Black-Box Vision-Language Models May 2, 2025 Image Captioning Object Recognition
— Unverified 0Zoomer: Adaptive Image Focus Optimization for Black-box MLLM Apr 30, 2025 Image Captioning Object Recognition
— Unverified 0MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation Apr 29, 2025 cross-modal alignment Decoder
Code Code Available 0Zero-Shot, But at What Cost? Unveiling the Hidden Overhead of MILS's LLM-CLIP Framework for Image Captioning Apr 21, 2025 Image Captioning
— Unverified 0Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding Apr 20, 2025 Autonomous Driving Image Captioning
Code Code Available 0Generalized Visual Relation Detection with Diffusion Models Apr 16, 2025 Graph Generation Human-Object Interaction Detection
— Unverified 0LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation Apr 15, 2025 Image Captioning Question Answering
— Unverified 0TADACap: Time-series Adaptive Domain-Aware Captioning Apr 15, 2025 Image Captioning Retrieval
— Unverified 0Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization Apr 14, 2025 Benchmarking Earth Observation
— Unverified 0