SOTAVerified

Text Matching

Matching a target text to a source text based on their meaning.

Papers

Showing 2650 of 364 papers

TitleStatusHype
MMoE: Enhancing Multimodal Models with Mixtures of Multimodal Interaction ExpertsCode1
Cross-modal Active Complementary Learning with Self-refining CorrespondenceCode1
3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referring Expression SegmentationCode1
Text Matching Improves Sequential Recommendation by Reducing Popularity BiasesCode1
KETM:A Knowledge-Enhanced Text Matching methodCode1
Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative EliminationCode1
Advancing Visual Grounding with Scene Knowledge: Benchmark and MethodCode1
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language UnderstandingCode1
Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search BenchmarkCode1
Revisiting the Role of Language Priors in Vision-Language ModelsCode1
Improved Probabilistic Image-Text RepresentationsCode1
Are Diffusion Models Vision-And-Language Reasoners?Code1
UniTRec: A Unified Text-to-Text Transformer and Joint Contrastive Learning Framework for Text-based RecommendationCode1
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language LearnersCode1
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis EvaluationCode1
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured RepresentationsCode1
Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report GenerationCode1
Plug-and-Play Regulators for Image-Text MatchingCode1
BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity ConsistencyCode1
BrainCLIP: Bridging Brain and Visual-Linguistic Representation Via CLIP for Generic Natural Visual Stimulus DecodingCode1
Fine-Grained Image-Text Matching by Cross-Modal Hard Aligning NetworkCode1
Learning Semantic Relationship Among Instances for Image-Text MatchingCode1
ComCLIP: Training-Free Compositional Image and Text MatchingCode1
Self-supervised vision-language pretraining for Medical visual question answeringCode1
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training ModelCode1
Show:102550
← PrevPage 2 of 15Next →

No leaderboard results yet.