SOTAVerified

Text Matching

Matching a target text to a source text based on their meaning.

Papers

Showing 125 of 364 papers

TitleStatusHype
ColPali: Efficient Document Retrieval with Vision Language ModelsCode7
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text MatchingCode2
FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable LocalizationCode2
LLaQo: Towards a Query-Based Coach in Expressive Music Performance AssessmentCode2
Do You Remember? Dense Video Captioning with Cross-Modal Memory RetrievalCode2
MouSi: Poly-Visual-Expert Vision-Language ModelsCode2
3D-VisTA: Pre-trained Transformer for 3D Vision and Text AlignmentCode2
A Systematic Survey of Prompt Engineering on Vision-Language Foundation ModelsCode2
Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person RetrievalCode2
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text SpottingCode2
Language Models Can See: Plugging Visual Controls in Text GenerationCode2
Efficient Medical Vision-Language Alignment Through Adapting Masked Vision ModelsCode1
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIPCode1
IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Primary Word EmphasisCode1
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object RepresentationCode1
TDSM: Triplet Diffusion for Skeleton-Text Matching in Zero-Shot Action RecognitionCode1
Teach CLIP to Develop a Number Sense for Ordinal RegressionCode1
Image-text matching for large-scale book collectionsCode1
Composing Object Relations and Attributes for Image-Text MatchingCode1
Revisiting Deep Audio-Text Retrieval Through the Lens of TransportationCode1
Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text MatchingCode1
Narrative Action Evaluation with Prompt-Guided Multimodal InteractionCode1
RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-trainingCode1
ColorSwap: A Color and Word Order Dataset for Multimodal EvaluationCode1
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language ModelsCode1
Show:102550
← PrevPage 1 of 15Next →

No leaderboard results yet.