CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval Nov 19, 2024 Diversity Natural Language Queries
— Unverified 0BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language? Nov 19, 2024 Retrieval Text Retrieval
— Unverified 0Partial Scene Text Retrieval Nov 15, 2024 Multiple Instance Learning Retrieval
Code Code Available 0MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs Nov 4, 2024 Cross-Modal Retrieval Information Retrieval
— Unverified 0SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities Nov 4, 2024 Attribute Descriptive
— Unverified 0Nearest Neighbor Normalization Improves Multimodal Retrieval Oct 31, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1Multilingual Vision-Language Pre-training for the Remote Sensing Domain Oct 30, 2024 Cross-Modal Retrieval image-classification
Code Code Available 0Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization Oct 30, 2024 Image to text Image-to-Text Retrieval
— Unverified 0Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications Oct 29, 2024 Image Retrieval RAG
Code Code Available 2Do Audio-Language Models Understand Linguistic Variations? Oct 21, 2024 Contrastive Learning Natural Language Queries
— Unverified 0GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning Oct 20, 2024 Image Retrieval Image-text Retrieval
Code Code Available 0Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging Oct 19, 2024 model Semantic Textual Similarity
— Unverified 0Beyond Coarse-Grained Matching in Video-Text Retrieval Oct 16, 2024 Retrieval Text Retrieval
— Unverified 0CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning Oct 15, 2024 Image-text Retrieval Text Retrieval
— Unverified 0Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval Oct 9, 2024 Retrieval Text Retrieval
Code Code Available 1LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning Oct 9, 2024 Large Language Model Motion Captioning
— Unverified 0AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models Oct 7, 2024 Image Captioning Image-text Retrieval
— Unverified 0CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation Oct 3, 2024 Contrastive Learning Form
— Unverified 0From Unimodal to Multimodal: Scaling up Projectors to Align Modalities Sep 28, 2024 Image-text Retrieval Semantic Similarity
Code Code Available 0Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization Sep 26, 2024 Image to text Image-to-Text Retrieval
— Unverified 0DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval Sep 16, 2024 AudioCaps Retrieval
— Unverified 0NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training Sep 15, 2024 Contrastive Learning cross-modal alignment
— Unverified 0ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds Sep 13, 2024 Audio Classification Descriptive
Code Code Available 1Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG Sep 12, 2024 Benchmarking Question Answering
— Unverified 0Pushing the Limits of Vision-Language Models in Remote Sensing without Human Annotations Sep 11, 2024 Image-text Retrieval Text Retrieval
— Unverified 0Benchmarking and Building Zero-Shot Hindi Retrieval Model with Hindi-BEIR and NLLB-E5 Sep 9, 2024 Benchmarking Information Retrieval
— Unverified 0MODOC: A Modular Interface for Flexible Interlinking of Text Retrieval and Text Generation Functions Aug 26, 2024 Information Retrieval Retrieval
Code Code Available 0Mistral-SPLADE: LLMs for better Learned Sparse Retrieval Aug 20, 2024 Decoder Language Modeling
Code Code Available 0Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores Aug 19, 2024 Retrieval Semantic Textual Similarity
— Unverified 0NAVERO: Unlocking Fine-Grained Semantics for Video-Language Compositionality Aug 18, 2024 Retrieval Text Retrieval
— Unverified 0Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval Aug 15, 2024 Information Retrieval Mamba
— Unverified 0Pairing Clustered Inverted Indexes with kNN Graphs for Fast Approximate Retrieval over Learned Sparse Representations Aug 8, 2024 Retrieval Text Retrieval
— Unverified 0COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark Aug 5, 2024 Dense Video Captioning Diversity
Code Code Available 1Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation Aug 2, 2024 Image-text Retrieval Retrieval
— Unverified 0Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval Aug 1, 2024 Attribute Optical Character Recognition
Code Code Available 1Learning Video Context as Interleaved Multimodal Sequences Jul 31, 2024 Language Modeling Language Modelling
Code Code Available 1GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models Jul 30, 2024 Image to text Image-to-Text Retrieval
Code Code Available 0mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval Jul 29, 2024 Contrastive Learning Reranking
— Unverified 0FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis Jul 29, 2024 Image-text Retrieval Model Selection
Code Code Available 0Multi-label Cluster Discrimination for Visual Representation Learning Jul 24, 2024 Contrastive Learning Image-text Retrieval
Code Code Available 4Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective Jul 21, 2024 Image-text Retrieval Information Retrieval
— Unverified 0Multimodal Misinformation Detection using Large Vision-Language Models Jul 19, 2024 Fact Checking Fact Verification
— Unverified 0Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval Jul 17, 2024 Image-text Retrieval Object
Code Code Available 0BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Jul 16, 2024 Question Answering Retrieval
Code Code Available 5Video-Language Alignment via Spatio-Temporal Graph Transformer Jul 16, 2024 Contrastive Learning Question Answering
Code Code Available 1EA-VTR: Event-Aware Video-Text Retrieval Jul 10, 2024 Action Recognition Contrastive Learning
— Unverified 0How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval? Jul 10, 2024 Contrastive Learning Image-text Retrieval
— Unverified 0Towards a text-based quantitative and explainable histopathology image analysis Jul 10, 2024 image-classification Image Classification
Code Code Available 0CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging Jul 10, 2024 Contrastive Learning Image-text Retrieval
— Unverified 0CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding Jul 9, 2024 Contrastive Learning Domain Adaptation
— Unverified 0