| Pooled Motion Features for First-Person Videos | Dec 19, 2014 | Activity RecognitionActivity Recognition In Videos | CodeCode Available | 0 |
| End-to-End Learning of Motion Representation for Video Understanding | Apr 2, 2018 | Action RecognitionOptical Flow Estimation | CodeCode Available | 0 |
| A Coding Framework and Benchmark towards Low-Bitrate Video Understanding | Feb 6, 2022 | Video CompressionVideo Understanding | CodeCode Available | 0 |
| Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and Benchmark | Sep 23, 2021 | Video Understanding | CodeCode Available | 0 |
| EgoVLM: Policy Optimization for Egocentric Video Understanding | Jun 3, 2025 | EgoSchemaQuestion Answering | CodeCode Available | 0 |
| On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow Analysis | Mar 15, 2022 | Video Understanding | CodeCode Available | 0 |
| ECO: Efficient Convolutional Network for Online Video Understanding | Apr 24, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions | Nov 24, 2024 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet Architecture | Apr 18, 2020 | Anomaly DetectionClassification | CodeCode Available | 0 |
| Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification | Jul 5, 2017 | AttributeGeneral Classification | CodeCode Available | 0 |
| DramaQA: Character-Centered Video Story Understanding with Hierarchical QA | May 7, 2020 | Question AnsweringVideo Question Answering | CodeCode Available | 0 |
| Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos | Feb 16, 2024 | Decision MakingVideo Understanding | CodeCode Available | 0 |
| NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy Labels | Oct 13, 2021 | Action ClassificationSelf-Supervised Learning | CodeCode Available | 0 |
| NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video Classification | Nov 12, 2018 | Efficient Neural NetworkGeneral Classification | CodeCode Available | 0 |
| Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding | Nov 1, 2019 | Action DetectionAction Recognition | CodeCode Available | 0 |
| Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning | Jan 8, 2024 | object-detectionObject Detection | CodeCode Available | 0 |
| Multimodal Dialogue State Tracking | Jun 16, 2022 | Dialogue State TrackingVideo Understanding | CodeCode Available | 0 |
| Don't Judge by the Look: Towards Motion Coherent Video Representation | Mar 14, 2024 | Data AugmentationObject Recognition | CodeCode Available | 0 |
| (Un)likelihood Training for Interpretable Embedding | Jul 1, 2022 | Ad-hoc video searchDecoder | CodeCode Available | 0 |
| Unsupervised Adversarial Visual Level Domain Adaptation for Learning Video Object Detectors from Images | Oct 4, 2018 | Domain AdaptationImage-to-Image Translation | CodeCode Available | 0 |
| video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models | Jun 22, 2024 | DiversityLanguage Modeling | CodeCode Available | 0 |
| X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer | Dec 12, 2023 | Action RecognitionAction Segmentation | CodeCode Available | 0 |
| Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding | Apr 20, 2025 | Autonomous DrivingImage Captioning | CodeCode Available | 0 |
| Diagnosing Error in Temporal Action Detectors | Jul 27, 2018 | Action LocalizationDiagnostic | CodeCode Available | 0 |
| Multi-attention Networks for Temporal Localization of Video-level Labels | Nov 15, 2019 | Action RecognitionTemporal Action Localization | CodeCode Available | 0 |
| MOFO: MOtion FOcused Self-Supervision for Video Understanding | Aug 23, 2023 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization | Oct 27, 2019 | Knowledge DistillationVideo Understanding | CodeCode Available | 0 |
| Detection-Fusion for Knowledge Graph Extraction from Videos | Dec 30, 2024 | Knowledge GraphsLanguage Modeling | CodeCode Available | 0 |
| Vamos: Versatile Action Models for Video Understanding | Nov 22, 2023 | EgoSchemaHard Attention | CodeCode Available | 0 |
| Are current long-term video understanding datasets long-term? | Aug 22, 2023 | Action RecognitionVideo Understanding | CodeCode Available | 0 |
| Audio Caption in a Car Setting with a Sentence-Level Loss | May 31, 2019 | Audio captioningDecoder | CodeCode Available | 0 |
| VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models | May 13, 2025 | FormMultiple-choice | CodeCode Available | 0 |
| VideoDG: Generalizing Temporal Relations in Videos to Novel Domains | Dec 8, 2019 | Action RecognitionData Augmentation | CodeCode Available | 0 |
| Detect-and-Track: Efficient Pose Estimation in Videos | Dec 26, 2017 | Human DetectionKeypoint Estimation | CodeCode Available | 0 |
| MINOTAUR: Multi-task Video Grounding From Multimodal Queries | Feb 16, 2023 | Action DetectionSentence | CodeCode Available | 0 |
| AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding | Jun 16, 2025 | Optical Character Recognition (OCR)RAG | CodeCode Available | 0 |
| Deep Learning Methods for Efficient Large Scale Video Labeling | Jun 14, 2017 | Deep LearningVideo Understanding | CodeCode Available | 0 |
| Creative Flow+ Dataset | Jun 1, 2019 | 3D Character Animation From A Single PhotoDepth Estimation | CodeCode Available | 0 |
| Contextual Explainable Video Representation: Human Perception-based Understanding | Dec 12, 2022 | Action DetectionAction Recognition | CodeCode Available | 0 |
| A Challenge to Build Neuro-Symbolic Video Agents | May 20, 2025 | Scene ClassificationVideo Retrieval | CodeCode Available | 0 |
| METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding | Jun 3, 2025 | Video Understanding | CodeCode Available | 0 |
| Constrained-size Tensorflow Models for YouTube-8M Video Understanding Challenge | Aug 21, 2018 | Video Understanding | CodeCode Available | 0 |
| Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022 | Nov 18, 2022 | Object State Change ClassificationTemporal Localization | CodeCode Available | 0 |
| Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection | Dec 7, 2019 | object-detectionObject Detection | CodeCode Available | 0 |
| SoccerDB: A Large-Scale Database for Comprehensive Video Understanding | Dec 10, 2019 | Action ClassificationAction Detection | CodeCode Available | 0 |
| Video Action Understanding | Oct 13, 2020 | Action UnderstandingDeep Learning | CodeCode Available | 0 |
| VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Mar 21, 2024 | Pose EstimationVideo Understanding | CodeCode Available | 0 |
| Long-Term Feature Banks for Detailed Video Understanding | Dec 12, 2018 | Action ClassificationAction Recognition | CodeCode Available | 0 |
| Localizing Moments in Video with Temporal Language | Sep 5, 2018 | Natural Language QueriesRetrieval | CodeCode Available | 0 |