SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 125 of 486 papers

TitleStatusHype
MAGMaR Shared Task System Description: Video Retrieval with OmniEmbed0
Q2E: Query-to-Event Decomposition for Zero-Shot Multilingual Text-to-Video Retrieval0
From Play to Replay: Composed Video Retrieval for Temporally Fine-Grained VideosCode0
Leveraging Auxiliary Information in Text-to-Video Retrieval: A Review0
Learning World Models for Interactive Video Generation0
A Challenge to Build Neuro-Symbolic Video AgentsCode0
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Video-GPT via Next Clip DiffusionCode1
Contrastive Alignment with Semantic Gap-Aware Corrections in Text-Video RetrievalCode0
CMAWRNet: Multiple Adverse Weather Removal via a Unified Quaternion Neural Architecture0
Empowering Agentic Video Analytics Systems with Video Language Models0
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data StreamsCode0
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval0
Towards Efficient Partially Relevant Video Retrieval with Active Moment DiscoveringCode0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
TC-MGC: Text-Conditioned Multi-Grained Contrastive Learning for Text-Video RetrievalCode0
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval0
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval0
Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)Code0
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory0
StableFusion: Continual Video Retrieval via Frame AdaptationCode1
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model0
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions0
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning0
Learning to Generate Long-term Future Narrations Describing Activities of Daily Living0
Show:102550
← PrevPage 1 of 20Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified