Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval Jun 26, 2025 Cross-Modal Retrieval Image-text Retrieval
— Unverified 0Tree-Based Text Retrieval via Hierarchical Clustering in RAGFrameworks: Application on Taiwanese Regulations Jun 16, 2025 RAG Retrieval
Code Code Available 0GLAP: General contrastive audio-text pretraining across domains and languages Jun 12, 2025 AudioCaps Keyword Spotting
Code Code Available 2MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling Jun 12, 2025 16k Retrieval
Code Code Available 0Improving Medical Visual Representation Learning with Pathological-level Cross-Modal Alignment and Correlation Exploration Jun 12, 2025 cross-modal alignment Image to text
— Unverified 0TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning Jun 12, 2025 Answer Generation Chunking
Code Code Available 2Adding simple structure at inference improves Vision-Language Compositionality Jun 11, 2025 Attribute Image-text Retrieval
Code Code Available 0FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation Jun 10, 2025 Image-text Retrieval Question Answering
Code Code Available 2DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 1Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models Jun 10, 2025 Contrastive Learning Image-text matching
Code Code Available 1Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts Jun 5, 2025 Retrieval Text Retrieval
— Unverified 0Attacking Attention of Foundation Models Disrupts Downstream Tasks Jun 3, 2025 Depth Estimation Image-text Retrieval
Code Code Available 0ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase Generation May 30, 2025 Informativeness Keyphrase Generation
Code Code Available 0One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory May 29, 2025 Contrastive Learning Text Retrieval
Code Code Available 2MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval May 26, 2025 Image Retrieval Large Language Model
— Unverified 0Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation May 25, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models May 24, 2025 Image-text Retrieval Language Modeling
— Unverified 0Representation Discrepancy Bridging Method for Remote Sensing Image-Text Retrieval May 22, 2025 cross-modal alignment Image-text Retrieval
— Unverified 0LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts May 20, 2025 Caption Generation Retrieval
Code Code Available 1Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models May 20, 2025 Image-text Retrieval Text Retrieval
— Unverified 0mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs May 16, 2025 Information Retrieval Knowledge Graphs
Code Code Available 1Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution May 16, 2025 Cross-Modal Retrieval Image to text
— Unverified 0Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction May 12, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0A Vision-Language Foundation Model for Leaf Disease Identification May 11, 2025 Contrastive Learning image-classification
Code Code Available 0FG-CLIP: Fine-Grained Visual and Textual Alignment May 8, 2025 Image-text Retrieval object-detection
Code Code Available 4QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort May 7, 2025 Information Retrieval Reranking
— Unverified 0AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection Apr 28, 2025 Adversarial Attack Anomaly Detection
— Unverified 0Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Apr 24, 2025 Image-text Retrieval Instruction Following
— Unverified 0Towards Understanding Camera Motions in Any Video Apr 21, 2025 Question Answering Text Retrieval
— Unverified 0SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Apr 17, 2025 Cross-Modal Retrieval Image Retrieval
— Unverified 0DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation Apr 16, 2025 Contrastive Learning Image to text
— Unverified 0FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations Apr 11, 2025 image-classification Image Classification
— Unverified 0Bridging Queries and Tables through Entities in Table Retrieval Apr 9, 2025 Retrieval Table Retrieval
— Unverified 0LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders Apr 4, 2025 Self-Supervised Learning Text Retrieval
— Unverified 0Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval Apr 3, 2025 Information Retrieval Representation Learning
— Unverified 0M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP Mar 28, 2025 Audio captioning Audio Classification
Code Code Available 0Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models Mar 25, 2025 Benchmarking Image Captioning
Code Code Available 1SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI Mar 25, 2025 Contrastive Learning Image Segmentation
— Unverified 0Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis Mar 25, 2025 Contrastive Learning Image-text Retrieval
Code Code Available 2GOAL: Global-local Object Alignment Learning Mar 22, 2025 Descriptive Object
Code Code Available 1Anatomy-Aware Conditional Image-Text Retrieval Mar 10, 2025 Anatomy Contrastive Learning
— Unverified 0Bridging Classical and Quantum String Matching: A Computational Reformulation of Bit-Parallelism Mar 7, 2025 Text Retrieval
— Unverified 0Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings Mar 5, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective Mar 4, 2025 Retrieval Sentence
— Unverified 0LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Mar 4, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Mar 3, 2025 Contrastive Learning Text Retrieval
— Unverified 0MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations Mar 2, 2025 image-classification Image Classification
— Unverified 0ABC: Achieving Better Control of Multimodal Embeddings using VLMs Mar 1, 2025 Image to text Image-to-Text Retrieval
— Unverified 0How Vital is the Jurisprudential Relevance: Law Article Intervened Legal Case Retrieval and Matching Feb 25, 2025 Multi-Task Learning Retrieval
— Unverified 0Progressive Local Alignment for Medical Multimodal Pre-training Feb 25, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0