SOTAVerified

Video Recognition

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Papers

Showing 101150 of 307 papers

TitleStatusHype
Real-time Online Video Detection with Temporal Smoothing TransformersCode1
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition0
Video Mobile-Former: Video Recognition with Efficient Global Spatial-temporal Modeling0
Efficient Attention-free Video Shift Transformers0
Frozen CLIP Models are Efficient Video LearnersCode1
Expanding Language-Image Pretrained Models for General Video RecognitionCode3
Adaptive occlusion sensitivity analysis for visually explaining video recognition networksCode0
MAR: Masked Autoencoders for Efficient Action RecognitionCode1
Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention MechanismCode0
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition0
Temporal Saliency Query Network for Efficient Video Recognition0
Is an Object-Centric Video Representation Beneficial for Transfer?0
VidConv: A modernized 2D ConvNet for Efficient Video RecognitionCode0
EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2022: Team HNU-FPV Technical Report0
Revisiting Classifier: Transferring Vision-Language Models for Video RecognitionCode2
Exploring Temporally Dynamic Data Augmentation for Video Recognition0
M&M Mix: A Multimodal Multiview Transformer Ensemble0
MLP-3D: A MLP-like 3D Architecture with Grouped Time MixingCode0
Spatial-temporal Concept based Explanation of 3D ConvNetsCode0
AdaptFormer: Adapting Vision Transformers for Scalable Visual RecognitionCode2
Noise-Tolerant Learning for Audio-Visual Action Recognition0
In Defense of Image Pre-Training for Spatiotemporal RecognitionCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
Class-Incremental Learning for Action Recognition in Videos0
FAR: Fourier Aerial Video RecognitionCode0
Group Contextualization for Video RecognitionCode1
Gate-Shift-Fuse for Video Action RecognitionCode0
Audio-Visual Fusion Layers for Event Type Aware Video Recognition0
Should I take a walk? Estimating Energy Expenditure from Video DataCode0
Fast Differentiable Matrix Square Root and Inverse Square RootCode1
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionCode1
Action Keypoint Network for Efficient Video Recognition0
OCSampler: Compressing Videos to One Clip with Single-step SamplingCode1
Condensing a Sequence to One Informative Frame for Video Recognition0
Optimization Planning for 3D ConvNetsCode0
Glance and Focus Networks for Dynamic Visual RecognitionCode1
Recurring the Transformer for Video Action Recognition0
Improving Video Model Transfer With Dynamic Representation Learning0
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video RecognitionCode1
Cross-Modal Transferable Adversarial Attacks from Images to Videos0
DualFormer: Local-Global Stratified Transformer for Efficient Video RecognitionCode1
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search0
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
Pooling by Sliced-Wasserstein EmbeddingCode1
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-LearningCode1
Efficient Video Transformers with Spatial-Temporal Token SelectionCode1
Attacking Video Recognition Models with Bullet-Screen CommentsCode1
ST-ABN: Visual Explanation Taking into Account Spatio-temporal Information for Video RecognitionCode0
Temporal-attentive Covariance Pooling Networks for Video RecognitionCode1
Show:102550
← PrevPage 3 of 7Next →

No leaderboard results yet.