SOTAVerified

Video Retrieval

The objective of video retrieval is as follows: given a text query and a pool of candidate videos, select the video which corresponds to the text query. Typically, the videos are returned as a ranked list of candidates and scored via document retrieval metrics.

Papers

Showing 351375 of 486 papers

TitleStatusHype
Sound and Visual Representation Learning with Multiple Pretraining Tasks0
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding0
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding0
Spatio-temporal Video Re-localization by Warp LSTM0
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics0
SSAN: Separable Self-Attention Network for Video Representation Learning0
STAR-GNN: Spatial-Temporal Video Representation for Content-based Retrieval0
Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning0
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training0
Strategies for Searching Video Content with Text Queries or Video Examples0
Support-set bottlenecks for video-text representation learning0
SVD: A Large-Scale Short Video Dataset for Near-Duplicate Video Retrieval0
SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval0
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets0
System Analysis And Design For Multimedia Retrieval Systems0
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment0
TeachCLIP: Multi-Grained Teaching for Efficient Text-to-Video Retrieval0
Temporal Contrastive Graph Learning for Video Action Recognition and Retrieval0
Temporal Contrastive Learning with Curriculum0
Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos0
Temporal Perceiving Video-Language Pre-training0
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval0
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval0
Text-Video Retrieval via Variational Multi-Modal Hypergraph Networks0
The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval0
Show:102550
← PrevPage 15 of 20Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1OmniVectext-to-video R@1089.4Unverified
2CLIP4Cliptext-to-video R@1081.6Unverified
3OmniVec (pretrained)text-to-video R@1078.6Unverified
4HunYuan_tvr (huge)text-to-video R@162.9Unverified
5CLIP-ViPtext-to-video R@157.7Unverified
6PIDRotext-to-video R@155.9Unverified
7DMAE (ViT-B/16)text-to-video R@155.5Unverified
8HunYuan_tvrtext-to-video R@155Unverified
9MuLTItext-to-video R@154.7Unverified
10EERCFtext-to-video R@154.1Unverified
#ModelMetricClaimedVerifiedStatus
1Aurora (ours, r=64)text-to-video R@577.4Unverified
2InternVideo2-6Btext-to-video R@174.2Unverified
3vid-TLDR (UMT-L)text-to-video R@172.3Unverified
4VASTtext-to-video R@172Unverified
5COSAtext-to-video R@170.5Unverified
6UMT-L (ViT-L/16)text-to-video R@170.4Unverified
7GRAMtext-to-video R@167.3Unverified
8VALORtext-to-video R@161.5Unverified
9TESTA (ViT-B/16)text-to-video R@161.2Unverified
10VindLUtext-to-video R@161.2Unverified
#ModelMetricClaimedVerifiedStatus
1GRAMtext-to-video R@164Unverified
2VASTtext-to-video R@163.9Unverified
3InternVideo2-6Btext-to-video R@162.8Unverified
4VALORtext-to-video R@159.9Unverified
5UMT-L (ViT-L/16)text-to-video R@158.8Unverified
6vid-TLDR (UMT-L)text-to-video R@158.1Unverified
7COSAtext-to-video R@157.9Unverified
8InternVideo2-6Btext-to-video R@155.9Unverified
9InternVideotext-to-video R@155.2Unverified
10VLABtext-to-video R@155.1Unverified
#ModelMetricClaimedVerifiedStatus
1EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)text-to-video R@1053.7Unverified
2InternVideo2-6Btext-to-video R@146.4Unverified
3vid-TLDR (UMT-L)text-to-video R@143.1Unverified
4UMT-L (ViT-L/16)text-to-video R@143Unverified
5HunYuan_tvr (huge)text-to-video R@140.4Unverified
6COSAtext-to-video R@139.4Unverified
7mPLUG-2text-to-video R@134.4Unverified
8VALORtext-to-video R@134.2Unverified
9InternVideotext-to-video R@134Unverified
10InternVideo2-6Btext-to-video R@133.8Unverified