SOTAVerified

Video Recognition

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Papers

Showing 51100 of 307 papers

TitleStatusHype
Adapting Short-Term Transformers for Action Detection in Untrimmed VideosCode1
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionCode1
Audio-Visual Class-Incremental LearningCode1
0-MMS: Zero-Shot Multi-Motion Segmentation With A Monocular Event CameraCode1
Multiscale Vision TransformersCode1
Deep Feature Flow for Video RecognitionCode1
Pooling by Sliced-Wasserstein EmbeddingCode1
No Time to Waste: Squeeze Time into Channel for Mobile Video UnderstandingCode1
PAVE: Patching and Adapting Video Large Language ModelsCode1
Generalized Few-Shot Video Classification with Video Retrieval and Feature GenerationCode1
Piano Skills AssessmentCode1
Prune Spatio-temporal Tokens by Semantic-aware Temporal AccumulationCode1
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video RecognitionCode1
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill EstimationCode1
Dissected 3D CNNs: Temporal Skip Connections for Efficient Online Video ProcessingCode1
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video RecognitionCode1
PatchNet -- Short-range Template Matching for Efficient Video ProcessingCode1
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer LearningCode1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
Frozen CLIP Models are Efficient Video LearnersCode1
AdaFocusV3: On Unified Spatial-temporal Dynamic Video RecognitionCode1
DSANet: Dynamic Segment Aggregation Network for Video-Level Representation LearningCode1
DualFormer: Local-Global Stratified Transformer for Efficient Video RecognitionCode1
Boosting the Transferability of Video Adversarial Examples via Temporal TranslationCode1
Dynamic Network Quantization for Efficient Video InferenceCode1
FrameExit: Conditional Early Exiting for Efficient Video RecognitionCode1
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Glance and Focus Networks for Dynamic Visual RecognitionCode1
In Defense of Image Pre-Training for Spatiotemporal RecognitionCode1
Efficient Movie Scene Detection using State-Space TransformersCode1
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-LearningCode1
Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual RecognitionCode1
Efficient Video Transformers with Spatial-Temporal Token SelectionCode1
CatNet: Class Incremental 3D ConvNets for Lifelong Egocentric Gesture RecognitionCode1
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionCode1
Implicit Temporal Modeling with Learnable Alignment for Video RecognitionCode1
Adversarial Bipartite Graph Learning for Video Domain AdaptationCode1
Clean-Label Backdoor Attacks on Video Recognition ModelsCode1
Depth Guided Adaptive Meta-Fusion Network for Few-shot Video RecognitionCode1
Frame Flexible NetworkCode1
Cluster and Aggregate: Face Recognition with Large Probe SetCode1
Learning Equivariant RepresentationsCode1
Self-supervised Video Representation Learning Using Inter-intra Contrastive FrameworkCode1
Self-supervised Video Representation Learning with Cross-Stream Prototypical ContrastingCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Fast Differentiable Matrix Square Root and Inverse Square RootCode1
Look More but Care Less in Video RecognitionCode1
Making Vision Transformers Efficient from A Token Sparsification ViewCode1
Attacking Video Recognition Models with Bullet-Screen CommentsCode1
Over-the-Air Adversarial Flickering Attacks against Video Recognition NetworksCode1
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.