SOTAVerified

Video Recognition

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Papers

Showing 125 of 307 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal RepresentationsCode5
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video RecognitionCode2
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo BenchmarkCode2
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT AdaptationCode2
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsCode2
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionCode2
AdaptFormer: Adapting Vision Transformers for Scalable Visual RecognitionCode2
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge DeviceCode2
Video Swin TransformerCode2
Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?Code2
X3D: Expanding Architectures for Efficient Video RecognitionCode2
Omni-sourced Webly-supervised Learning for Video RecognitionCode2
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill EstimationCode1
PAVE: Patching and Adapting Video Large Language ModelsCode1
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature LearningCode1
VideoMamba: Spatio-Temporal Selective State Space ModelCode1
No Time to Waste: Squeeze Time into Channel for Mobile Video UnderstandingCode1
VG4D: Vision-Language Model Goes 4D Video RecognitionCode1
Video Recognition in Portrait ModeCode1
Adapting Short-Term Transformers for Action Detection in Untrimmed VideosCode1
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video RecognitionCode1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Show:102550
← PrevPage 1 of 13Next →

No leaderboard results yet.