| DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition | Jul 16, 2025 | BenchmarkingKnowledge Distillation | CodeCode Available | 0 |
| VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models | May 13, 2025 | FormMultiple-choice | CodeCode Available | 0 |
| Gameplay Highlights Generation | May 12, 2025 | Event DetectionHighlight Detection | —Unverified | 0 |
| Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos | Apr 21, 2025 | Adversarial RobustnessVideo Recognition | —Unverified | 0 |
| CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition | Mar 30, 2025 | Action ClassificationAction Recognition | —Unverified | 0 |
| Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering | Mar 27, 2025 | Emotion RecognitionQuestion Answering | —Unverified | 0 |
| BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation | Mar 26, 2025 | Video Recognition | CodeCode Available | 1 |
| PAVE: Patching and Adapting Video Large Language Models | Mar 25, 2025 | Audio-visual Question AnsweringMulti-Task Learning | CodeCode Available | 1 |
| VTD-CLIP: Video-to-Text Discretization via Prompting CLIP | Mar 24, 2025 | parameter-efficient fine-tuningVideo Recognition | CodeCode Available | 0 |
| Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition | Mar 17, 2025 | Action RecognitionVideo Recognition | —Unverified | 0 |
| A Simple and Efficient Baseline for Video Action Recognition | Mar 2, 2025 | Action RecognitionFine-grained Action Recognition | —Unverified | 0 |
| VideoPure: Diffusion-based Adversarial Purification for Video Recognition | Jan 25, 2025 | Adversarial DefenseAdversarial Purification | CodeCode Available | 0 |
| Action Detail Matters: Refining Video Recognition with Local Action Queries | Jan 1, 2025 | Action RecognitionTemporal Action Localization | —Unverified | 0 |
| DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments | Dec 28, 2024 | Action LocalizationAction Recognition | —Unverified | 0 |
| Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition | Dec 15, 2024 | Computational EfficiencyVideo Recognition | CodeCode Available | 2 |
| Standardization Trends on Safety and Trustworthiness Technology for Advanced AI | Oct 29, 2024 | Video Recognition | —Unverified | 0 |
| MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer | Oct 14, 2024 | Transfer LearningVideo Recognition | CodeCode Available | 0 |
| Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations | Oct 10, 2024 | Time Series ForecastingVideo Recognition | CodeCode Available | 5 |
| A Novel Audio-Visual Information Fusion System for Mental Disorders Detection | Sep 3, 2024 | EEGVideo Recognition | —Unverified | 0 |
| GenRec: Unifying Video Generation and Recognition with Diffusion Models | Aug 27, 2024 | Image to Video GenerationVideo Generation | CodeCode Available | 0 |
| OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning | Aug 12, 2024 | Video RecognitionZero-Shot Learning | CodeCode Available | 1 |
| VideoMamba: Spatio-Temporal Selective State Space Model | Jul 11, 2024 | Mambamodel | CodeCode Available | 1 |
| Purification Of Contaminated Convolutional Neural Networks Via Robust Recovery: An Approach with Theoretical Guarantee in One-Hidden-Layer Case | Jul 4, 2024 | image-classificationImage Classification | —Unverified | 0 |
| PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition | Jul 3, 2024 | PositionVideo Recognition | CodeCode Available | 0 |
| MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD | Jun 11, 2024 | Video RecognitionVideo Understanding | —Unverified | 0 |
| DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | May 30, 2024 | DeepFake DetectionMamba | CodeCode Available | 2 |
| Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions | May 28, 2024 | Action RecognitionVideo Recognition | —Unverified | 0 |
| No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding | May 14, 2024 | Action DetectionGPU | CodeCode Available | 1 |
| Transfer-LMR: Heavy-Tail Driving Behavior Recognition in Diverse Traffic Scenarios | May 8, 2024 | Video Recognition | —Unverified | 0 |
| Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition | Apr 30, 2024 | Action ClassificationAction Recognition | —Unverified | 0 |
| VG4D: Vision-Language Model Goes 4D Video Recognition | Apr 17, 2024 | Action RecognitionAutonomous Driving | CodeCode Available | 1 |
| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model | Mar 18, 2024 | Adversarial AttackStyle Transfer | —Unverified | 0 |
| Don't Judge by the Look: Towards Motion Coherent Video Representation | Mar 14, 2024 | Data AugmentationObject Recognition | CodeCode Available | 0 |
| Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition | Feb 29, 2024 | Transfer LearningVideo Recognition | —Unverified | 0 |
| Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition | Jan 11, 2024 | Video Recognition | CodeCode Available | 0 |
| Motion Guided Token Compression for Efficient Masked Video Modeling | Jan 10, 2024 | Video CompressionVideo Recognition | —Unverified | 0 |
| HaltingVT: Adaptive Token Halting Transformer for Efficient Video Recognition | Jan 10, 2024 | Action RecognitionAction Recognition In Videos | CodeCode Available | 0 |
| Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification | Jan 8, 2024 | Action RecognitionContrastive Learning | —Unverified | 0 |
| Video Recognition in Portrait Mode | Dec 21, 2023 | Data AugmentationVideo Recognition | CodeCode Available | 1 |
| Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition | Dec 18, 2023 | Video Recognition | CodeCode Available | 0 |
| LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer | Dec 15, 2023 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 |
| Adapting Short-Term Transformers for Action Detection in Untrimmed Videos | Dec 4, 2023 | Action DetectionVideo Recognition | CodeCode Available | 1 |
| DEVIAS: Learning Disentangled Video Representations of Action and Scene | Nov 30, 2023 | Action RecognitionDecoder | CodeCode Available | 1 |
| OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition | Nov 30, 2023 | DescriptiveLanguage Modelling | CodeCode Available | 1 |
| Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition | Nov 10, 2023 | Video Recognition | CodeCode Available | 0 |
| Object-centric Video Representation for Long-term Action Anticipation | Oct 31, 2023 | Action AnticipationHuman-Object Interaction Detection | CodeCode Available | 0 |
| On the Relevance of Temporal Features for Medical Ultrasound Video Recognition | Oct 16, 2023 | Video Recognition | CodeCode Available | 0 |
| Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data | Oct 8, 2023 | Action RecognitionContinual Learning | CodeCode Available | 1 |