SOTAVerified

Video-Text Retrieval

Video-Text retrieval requires understanding of both video and language together. Therefore it's different to video retrieval task.

Papers

Showing 3140 of 111 papers

TitleStatusHype
Video Editing for Video Retrieval0
M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text RetrievalCode2
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic TasksCode1
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain RetrievalCode1
RGNet: A Unified Clip Retrieval and Grounding Network for Long VideosCode1
Leveraging Generative Language Models for Weakly Supervised Sentence Component Analysis in Video-Language Joint Learning0
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
Harvest Video Foundation Models via Efficient Post-PretrainingCode0
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language UnderstandingCode1
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
Show:102550
← PrevPage 4 of 12Next →

No leaderboard results yet.