SOTAVerified

Video Recognition

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Papers

Showing 51100 of 307 papers

TitleStatusHype
Adapting Short-Term Transformers for Action Detection in Untrimmed VideosCode1
Sharing Pain: Using Pain Domain Transfer for Video Recognition of Low Grade Orthopedic Pain in HorsesCode1
Audio-Visual Class-Incremental LearningCode1
Space-time Mixing Attention for Video TransformerCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Deep Feature Flow for Video RecognitionCode1
The effectiveness of MAE pre-pretraining for billion-scale pretrainingCode1
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
Learning Versatile Neural Architectures by Propagating Network CodesCode1
Frozen CLIP Models are Efficient Video LearnersCode1
AdaFocusV3: On Unified Spatial-temporal Dynamic Video RecognitionCode1
Frame Flexible NetworkCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill EstimationCode1
Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video ProcessingCode1
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video RecognitionCode1
Improved Residual Networks for Image and Video RecognitionCode1
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer LearningCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation LearningCode1
DualFormer: Local-Global Stratified Transformer for Efficient Video RecognitionCode1
Boosting the Transferability of Video Adversarial Examples via Temporal TranslationCode1
Dynamic Network Quantization for Efficient Video InferenceCode1
Glance and Focus Networks for Dynamic Visual RecognitionCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Learning Equivariant RepresentationsCode1
Making Vision Transformers Efficient from A Token Sparsification ViewCode1
Efficient Movie Scene Detection using State-Space TransformersCode1
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-LearningCode1
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language KnowledgeCode1
Efficient Video Transformers with Spatial-Temporal Token SelectionCode1
CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture RecognitionCode1
Multiscale Vision TransformersCode1
MVFNet: Multi-View Fusion Network for Efficient Video RecognitionCode1
Eventful Transformers: Leveraging Temporal Redundancy in Vision TransformersCode1
Clean-Label Backdoor Attacks on Video Recognition ModelsCode1
Adversarial Bipartite Graph Learning for Video Domain AdaptationCode1
Depth Guided Adaptive Meta-Fusion Network for Few-shot Video RecognitionCode1
Cluster and Aggregate: Face Recognition with Large Probe SetCode1
Generalized Few-Shot Video Classification with Video Retrieval and Feature GenerationCode1
Over-the-Air Adversarial Flickering Attacks against Video Recognition NetworksCode1
PAVE: Patching and Adapting Video Large Language ModelsCode1
FrameExit: Conditional Early Exiting for Efficient Video RecognitionCode1
Fast Differentiable Matrix Square Root and Inverse Square RootCode1
Real-time Online Video Detection with Temporal Smoothing TransformersCode1
Rethinking Resolution in the Context of Efficient Video RecognitionCode1
Large Scale Holistic Video UnderstandingCode1
In Defense of Image Pre-Training for Spatiotemporal RecognitionCode1
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.