Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models May 20, 2025 Image-text Retrieval Text Retrieval
— Unverified 0Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution May 16, 2025 Cross-Modal Retrieval Image to text
— Unverified 0Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late Interaction May 12, 2025 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0A Vision-Language Foundation Model for Leaf Disease Identification May 11, 2025 Contrastive Learning image-classification
Code Code Available 0QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort May 7, 2025 Information Retrieval Reranking
— Unverified 0AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection Apr 28, 2025 Adversarial Attack Anomaly Detection
— Unverified 0Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Apr 24, 2025 Image-text Retrieval Instruction Following
— Unverified 0Towards Understanding Camera Motions in Any Video Apr 21, 2025 Question Answering Text Retrieval
— Unverified 0SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs Apr 17, 2025 Cross-Modal Retrieval Image Retrieval
— Unverified 0DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation Apr 16, 2025 Contrastive Learning Image to text
— Unverified 0FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations Apr 11, 2025 image-classification Image Classification
— Unverified 0Bridging Queries and Tables through Entities in Table Retrieval Apr 9, 2025 Retrieval Table Retrieval
— Unverified 0LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders Apr 4, 2025 Self-Supervised Learning Text Retrieval
— Unverified 0Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval Apr 3, 2025 Information Retrieval Representation Learning
— Unverified 0M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP Mar 28, 2025 Audio captioning Audio Classification
— Unverified 0SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI Mar 25, 2025 Contrastive Learning Image Segmentation
— Unverified 0Anatomy-Aware Conditional Image-Text Retrieval Mar 10, 2025 Anatomy Contrastive Learning
— Unverified 0Bridging Classical and Quantum String Matching: A Computational Reformulation of Bit-Parallelism Mar 7, 2025 Text Retrieval
— Unverified 0Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings Mar 5, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective Mar 4, 2025 Retrieval Sentence
— Unverified 0LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning Mar 4, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Mar 3, 2025 Contrastive Learning Text Retrieval
— Unverified 0MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations Mar 2, 2025 image-classification Image Classification
— Unverified 0ABC: Achieving Better Control of Multimodal Embeddings using VLMs Mar 1, 2025 Image to text Image-to-Text Retrieval
— Unverified 0How Vital is the Jurisprudential Relevance: Law Article Intervened Legal Case Retrieval and Matching Feb 25, 2025 Multi-Task Learning Retrieval
— Unverified 0Progressive Local Alignment for Medical Multimodal Pre-training Feb 25, 2025 Contrastive Learning Image-text Retrieval
— Unverified 0Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts Feb 21, 2025 Contrastive Learning Decision Making
— Unverified 0ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution Errors Feb 20, 2025 AudioCaps Contrastive Learning
Code Code Available 0SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Feb 20, 2025 Fairness Image-text Retrieval
— Unverified 0LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical Retrieval Feb 15, 2025 Retrieval Text Retrieval
— Unverified 0Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach Feb 10, 2025 Federated Learning Image-text Retrieval
— Unverified 0DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions Feb 7, 2025 Anomaly Detection Image-text Retrieval
— Unverified 0Expertized Caption Auto-Enhancement for Video-Text Retrieval Feb 5, 2025 Caption Generation Retrieval
Code Code Available 0Scientometric Analysis of the German IR Community within TREC & CLEF Feb 5, 2025 Information Retrieval Retrieval
— Unverified 0Large Vision-Language Models for Knowledge-Grounded Data Annotation of Memes Jan 23, 2025 Emotion Classification Image Captioning
Code Code Available 0MASS: Overcoming Language Bias in Image-Text Matching Jan 20, 2025 Image-text matching Image-text Retrieval
— Unverified 0TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval Jan 19, 2025 Cross-Modal Retrieval Image-text Retrieval
— Unverified 0CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR Jan 1, 2025 All Optical Character Recognition
— Unverified 0Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training Jan 1, 2025 Image-text Retrieval Image to text
— Unverified 0Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment Jan 1, 2025 Relation Retrieval
— Unverified 0Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation Jan 1, 2025 image-classification Image Classification
— Unverified 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Jan 1, 2025 Contrastive Learning Text Retrieval
— Unverified 0CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval Dec 31, 2024 Retrieval Text Retrieval
— Unverified 0The Text Classification Pipeline: Starting Shallow going Deeper Dec 30, 2024 Classification Information Retrieval
— Unverified 0Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching Dec 26, 2024 Image-text matching Text Matching
— Unverified 0Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval Dec 26, 2024 Image-text Retrieval Information Retrieval
Code Code Available 0Optimizing Multi-Stage Language Models for Effective Text Retrieval Dec 26, 2024 Retrieval Text Retrieval
— Unverified 0PolySmart @ TRECVid 2024 Medical Video Question Answering Dec 20, 2024 Question Answering Retrieval
— Unverified 0SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval Dec 19, 2024 Knowledge Graphs RAG
— Unverified 0Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering Dec 19, 2024 Contrastive Learning Language Modeling
Code Code Available 0