SOTAVerified

Video Recognition

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Papers

Showing 150 of 307 papers

TitleStatusHype
DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action RecognitionCode0
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language ModelsCode0
Gameplay Highlights Generation0
Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos0
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition0
Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering0
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill EstimationCode1
PAVE: Patching and Adapting Video Large Language ModelsCode1
VTD-CLIP: Video-to-Text Discretization via Prompting CLIPCode0
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition0
A Simple and Efficient Baseline for Video Action Recognition0
VideoPure: Diffusion-based Adversarial Purification for Video RecognitionCode0
Action Detail Matters: Refining Video Recognition with Local Action Queries0
DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments0
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video RecognitionCode2
Standardization Trends on Safety and Trustworthiness Technology for Advanced AI0
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge TransferCode0
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal RepresentationsCode5
A Novel Audio-Visual Information Fusion System for Mental Disorders Detection0
GenRec: Unifying Video Generation and Recognition with Diffusion ModelsCode0
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature LearningCode1
VideoMamba: Spatio-Temporal Selective State Space ModelCode1
Purification Of Contaminated Convolutional Neural Networks Via Robust Recovery: An Approach with Theoretical Guarantee in One-Hidden-Layer Case0
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video RecognitionCode0
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD0
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo BenchmarkCode2
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions0
No Time to Waste: Squeeze Time into Channel for Mobile Video UnderstandingCode1
Transfer-LMR: Heavy-Tail Driving Behavior Recognition in Diverse Traffic Scenarios0
Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition0
VG4D: Vision-Language Model Goes 4D Video RecognitionCode1
InternVideo2: Scaling Foundation Models for Multimodal Video UnderstandingCode7
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT AdaptationCode2
LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model0
Don't Judge by the Look: Towards Motion Coherent Video RepresentationCode0
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition0
Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video RecognitionCode0
Motion Guided Token Compression for Efficient Masked Video Modeling0
HaltingVT: Adaptive Token Halting Transformer for Efficient Video RecognitionCode0
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification0
Video Recognition in Portrait ModeCode1
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video RecognitionCode0
LogoStyleFool: Vitiating Video Recognition Systems via Logo Style TransferCode0
Adapting Short-Term Transformers for Action Detection in Untrimmed VideosCode1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video RecognitionCode1
Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video RecognitionCode0
Object-centric Video Representation for Long-term Action AnticipationCode0
On the Relevance of Temporal Features for Medical Ultrasound Video RecognitionCode0
Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and DataCode1
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.