SOTAVerified

Image-text Retrieval

Papers

Showing 110 of 248 papers

TitleStatusHype
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and GenerationCode5
FG-CLIP: Fine-Grained Visual and Textual AlignmentCode4
Multi-label Cluster Discrimination for Visual Representation LearningCode4
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal UnderstandingCode3
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language ModelsCode3
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content CreationCode3
Vision-Language Pre-training: Basics, Recent Advances, and Future TrendsCode3
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model EvaluationCode2
Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image AnalysisCode2
Show:102550
← PrevPage 1 of 25Next →

No leaderboard results yet.