SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 251300 of 671 papers

TitleStatusHype
Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models0
Towards Cross-modal Retrieval in Chinese Cultural Heritage Documents: Dataset and Solution0
Reproducibility, Replicability, and Insights into Visual Document Retrieval with Late InteractionCode0
A Vision-Language Foundation Model for Leaf Disease IdentificationCode0
QBD-RankedDataGen: Generating Custom Ranked Datasets for Improving Query-By-Document Search Using LLM-Reranking with Reduced Human Effort0
AGATE: Stealthy Black-box Watermarking for Multimodal Model Copyright Protection0
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs0
Towards Understanding Camera Motions in Any Video0
SemCORE: A Semantic-Enhanced Generative Cross-Modal Retrieval Framework with MLLMs0
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation0
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations0
Bridging Queries and Tables through Entities in Table Retrieval0
LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval0
M2D2: Exploring General-purpose Audio-Language Representations Beyond CLAP0
SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI0
Anatomy-Aware Conditional Image-Text Retrieval0
Bridging Classical and Quantum String Matching: A Computational Reformulation of Bit-Parallelism0
Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings0
Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective0
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations0
ABC: Achieving Better Control of Multimodal Embeddings using VLMs0
How Vital is the Jurisprudential Relevance: Law Article Intervened Legal Case Retrieval and Matching0
Progressive Local Alignment for Medical Multimodal Pre-training0
Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts0
ATRI: Mitigating Multilingual Audio Text Retrieval Inconsistencies by Reducing Data Distribution ErrorsCode0
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features0
LSTM-based Selective Dense Text Retrieval Guided by Sparse Lexical Retrieval0
Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach0
DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions0
Expertized Caption Auto-Enhancement for Video-Text RetrievalCode0
Scientometric Analysis of the German IR Community within TREC & CLEF0
Large Vision-Language Models for Knowledge-Grounded Data Annotation of MemesCode0
MASS: Overcoming Language Bias in Image-Text Matching0
TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval0
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR0
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training0
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment0
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation0
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
The Text Classification Pipeline: Starting Shallow going Deeper0
Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching0
Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text RetrievalCode0
Optimizing Multi-Stage Language Models for Effective Text Retrieval0
PolySmart @ TRECVid 2024 Medical Video Question Answering0
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval0
Multimodal Hypothetical Summary for Retrieval-based Multi-image Question AnsweringCode0
Show:102550
← PrevPage 6 of 14Next →

No leaderboard results yet.