SOTAVerified

Text Retrieval

Text Retrieval is the task of finding the most text result (such as an answer, paragraph, or passage) given a query (which could be a question, keywords, or any relevant text)

Papers

Showing 151200 of 671 papers

TitleStatusHype
Data-Efficient Multimodal Fusion on a Single GPUCode1
Graph Optimal Transport for Cross-Domain AlignmentCode1
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding EvaluationCode1
Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video RetrievalCode1
Hyperbolic Image-Text RepresentationsCode1
I0T: Embedding Standardization Method Towards Zero Modality GapCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text RetrievalCode1
HANet: Hierarchical Alignment Networks for Video-Text RetrievalCode1
Equivariant Similarity for Vision-Language Foundation ModelsCode1
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-trainingCode1
GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image RecognitionCode1
Densifying Sparse Representations for Passage Retrieval by Representational SlicingCode1
LDMol: Text-to-Molecule Diffusion Model with Structurally Informative Latent SpaceCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial TrajectoryCode1
Cross-modal Scene Graph Matching for Relationship-aware Image-Text RetrievalCode1
Learning the Best Pooling Strategy for Visual Semantic EmbeddingCode1
Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak DecoderCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
GOAL: Global-local Object Alignment LearningCode1
LinkTransformer: A Unified Package for Record Linkage with Transformer Language ModelsCode1
Bridging Video-text Retrieval with Multiple Choice QuestionsCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Knowledge Guided Text Retrieval and Reading for Open Domain Question AnsweringCode1
Cross-Modal Retrieval with Partially Mismatched PairsCode1
Bridging Language Gaps in Audio-Text RetrievalCode1
Cross-Modal Retrieval for Motion and Text via DopTriple LossCode1
Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence RegularizationCode1
Generative Multi-hop RetrievalCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
MixGen: A New Multi-Modal Data AugmentationCode1
FuseCap: Leveraging Large Language Models for Enriched Fused Image CaptionsCode1
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based SearchCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connectionsCode1
Cross-modal Contrastive Learning for Speech TranslationCode1
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENerationCode1
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware SamplingCode1
Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language RepresentationsCode1
AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive LearningCode1
Frozen in Time: A Joint Video and Image Encoder for End-to-End RetrievalCode1
Efficient Token-Guided Image-Text Retrieval with Consistent Multimodal Contrastive TrainingCode1
GLEN: Generative Retrieval via Lexical Index LearningCode1
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language TransformersCode1
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New BenchmarkCode1
MV-Adapter: Multimodal Video Transfer Learning for Video Text RetrievalCode1
FlexiViT: One Model for All Patch SizesCode1
ALIP: Adaptive Language-Image Pre-training with Synthetic CaptionCode1
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.