SOTAVerified

Video Recognition

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Papers

Showing 150 of 307 papers

TitleStatusHype
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal RepresentationsCode5
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language ModelsCode2
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionCode2
TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge DeviceCode2
AdaptFormer: Adapting Vision Transformers for Scalable Visual RecognitionCode2
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo BenchmarkCode2
Video Swin TransformerCode2
Omni-sourced Webly-supervised Learning for Video RecognitionCode2
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video RecognitionCode2
Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?Code2
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT AdaptationCode2
X3D: Expanding Architectures for Efficient Video RecognitionCode2
In Defense of Image Pre-Training for Spatiotemporal RecognitionCode1
Fast Differentiable Matrix Square Root and Inverse Square RootCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
Eventful Transformers: Leveraging Temporal Redundancy in Vision TransformersCode1
Large Scale Holistic Video UnderstandingCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
Learning Equivariant RepresentationsCode1
Frozen CLIP Models are Efficient Video LearnersCode1
Efficient Video Transformers with Spatial-Temporal Token SelectionCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Adapting Short-Term Transformers for Action Detection in Untrimmed VideosCode1
Audio-Visual Class-Incremental LearningCode1
AdaMML: Adaptive Multi-Modal Learning for Efficient Video RecognitionCode1
Generalized Few-Shot Video Classification with Video Retrieval and Feature GenerationCode1
Adaptive Focus for Efficient Video RecognitionCode1
Improved Residual Networks for Image and Video RecognitionCode1
FrameExit: Conditional Early Exiting for Efficient Video RecognitionCode1
Dynamic Network Quantization for Efficient Video InferenceCode1
Frame Flexible NetworkCode1
Glance and Focus Networks for Dynamic Visual RecognitionCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Can An Image Classifier Suffice For Action Recognition?Code1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
Boosting the Transferability of Video Adversarial Examples via Temporal TranslationCode1
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation LearningCode1
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-LearningCode1
CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture RecognitionCode1
AdaFocusV3: On Unified Spatial-temporal Dynamic Video RecognitionCode1
Clean-Label Backdoor Attacks on Video Recognition ModelsCode1
Clockwork Convnets for Video Semantic SegmentationCode1
Cluster and Aggregate: Face Recognition with Large Probe SetCode1
DualFormer: Local-Global Stratified Transformer for Efficient Video RecognitionCode1
Depth Guided Adaptive Meta-Fusion Network for Few-shot Video RecognitionCode1
Efficient Movie Scene Detection using State-Space TransformersCode1
Attacking Video Recognition Models with Bullet-Screen CommentsCode1
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer LearningCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.