| ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset | Mar 24, 2025 | Activity RecognitionTemporal Localization | CodeCode Available | 0 | 5 |
| Dense Video Object Captioning from Disjoint Supervision | Jun 20, 2023 | ObjectSentence | CodeCode Available | 0 | 5 |
| Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study | Dec 29, 2024 | Motion DetectionOptical Character Recognition | CodeCode Available | 0 | 5 |
| HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization | Dec 26, 2017 | Action ClassificationAction Localization | CodeCode Available | 0 | 5 |
| Hierarchical and Multimodal Data for Daily Activity Understanding | Apr 24, 2025 | Action Anticipationcounterfactual | CodeCode Available | 0 | 5 |
| Hierarchical Deep Residual Reasoning for Temporal Moment Localization | Oct 31, 2021 | Language-Based Temporal LocalizationSentence | CodeCode Available | 0 | 5 |
| Learning to Localize Temporal Events in Large-scale Video Data | Oct 25, 2019 | Temporal LocalizationVideo Recognition | CodeCode Available | 0 | 5 |
| Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022 | Nov 18, 2022 | Object State Change ClassificationTemporal Localization | CodeCode Available | 0 | 5 |
| Multi-attention Networks for Temporal Localization of Video-level Labels | Nov 15, 2019 | Action RecognitionTemporal Action Localization | CodeCode Available | 0 | 5 |
| NAAQA: A Neural Architecture for Acoustic Question Answering | Jun 11, 2021 | Acoustic Question AnsweringQuestion Answering | CodeCode Available | 0 | 5 |
| Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks | Apr 19, 2016 | Action DetectionAction Recognition | CodeCode Available | 0 | 5 |
| RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization | Mar 30, 2019 | Action LocalizationTemporal Action Localization | CodeCode Available | 0 | 5 |
| Semi-supervised Active Learning for Video Action Detection | Dec 12, 2023 | Action DetectionActive Learning | CodeCode Available | 0 | 5 |
| Skeleton-Based Human Action Recognition with Noisy Labels | Mar 15, 2024 | Action RecognitionDenoising | CodeCode Available | 0 | 5 |
| SoftLoc: Robust Temporal Localization under Label Misalignment | Sep 25, 2019 | PositionTemporal Localization | CodeCode Available | 0 | 5 |
| TadML: A fast temporal action detection with Mechanics-MLP | Jun 7, 2022 | Action DetectionOptical Flow Estimation | CodeCode Available | 0 | 5 |
| Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs | Jan 9, 2016 | Action ClassificationAction Localization | CodeCode Available | 0 | 5 |
| Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images | Apr 4, 2015 | Action LocalizationAction Recognition | CodeCode Available | 0 | 5 |
| TimeRefine: Temporal Grounding with Time Refining Video LLM | Dec 12, 2024 | Temporal Localization | CodeCode Available | 0 | 5 |
| Transforming faces into video stories -- VideoFace2.0 | May 4, 2025 | Face DetectionFace Recognition | CodeCode Available | 0 | 5 |
| UnLoc: A Unified Framework for Video Localization Tasks | Aug 21, 2023 | Action SegmentationMoment Retrieval | CodeCode Available | 0 | 5 |
| VideoGLUE: Video General Understanding Evaluation of Foundation Models | Jul 6, 2023 | Action RecognitionTemporal Localization | CodeCode Available | 0 | 5 |
| Weakly Supervised Action Localization by Sparse Temporal Pooling Network | Dec 14, 2017 | Action ClassificationAction Localization | CodeCode Available | 0 | 5 |
| Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring | Feb 28, 2025 | Multiple Instance LearningTemporal Localization | CodeCode Available | 0 | 5 |
| When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs | Feb 16, 2022 | Action LocalizationTemporal Action Localization | CodeCode Available | 0 | 5 |
| Technical Report of the Video Event Reconstruction and Analysis (VERA) System -- Shooter Localization, Models, Interface, and Beyond | May 26, 2019 | Gunshot DetectionShooter Localization | CodeCode Available | 0 | 5 |
| Video Anomaly Detection for Smart Surveillance | Apr 1, 2020 | Anomaly DetectionTemporal Localization | —Unverified | 0 | 0 |
| Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022 | Nov 16, 2022 | Human-Object Interaction DetectionObject | —Unverified | 0 | 0 |
| Efficient Action Localization with Approximately Normalized Fisher Vectors | Jun 1, 2014 | Action LocalizationAction Recognition | —Unverified | 0 | 0 |
| Optimizing Temporal Resolution Of Convolutional Recurrent Neural Networks For Sound Event Detection | Oct 18, 2022 | Event DetectionSound Event Detection | —Unverified | 0 | 0 |
| OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos | Feb 10, 2022 | Action LocalizationTemporal Action Localization | —Unverified | 0 | 0 |
| PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization | Mar 9, 2021 | Action LocalizationBoundary Detection | —Unverified | 0 | 0 |
| Pointly-Supervised Action Localization | May 29, 2018 | Action LocalizationMultiple Instance Learning | —Unverified | 0 | 0 |
| Poselet Key-Framing: A Model for Human Activity Recognition | Jun 1, 2013 | Activity RecognitionHuman Activity Recognition | —Unverified | 0 | 0 |
| Practitioner-Centric Approach for Early Incident Detection Using Crowdsourced Data for Emergency Services | Dec 3, 2021 | Event DetectionManagement | —Unverified | 0 | 0 |
| Pseudo Strong Labels from Frame-Level Predictions for Weakly Supervised Sound Event Detection | Jan 7, 2025 | Event DetectionSound Event Detection | —Unverified | 0 | 0 |
| ReActNet: Temporal Localization of Repetitive Activities in Real-World Videos | Oct 14, 2019 | Temporal Localization | —Unverified | 0 | 0 |
| A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes | Feb 3, 2022 | Data AugmentationEvent Detection | —Unverified | 0 | 0 |
| Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos | Sep 18, 2020 | cross-modal alignmentreinforcement-learning | —Unverified | 0 | 0 |
| Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes | Jun 16, 2022 | Temporal Localization | —Unverified | 0 | 0 |
| Efficient Action Detection in Untrimmed Videos via Multi-Task Learning | Dec 22, 2016 | Action DetectionAction Localization | —Unverified | 0 | 0 |
| AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization | Nov 27, 2019 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| Sequential End-to-End Intent and Slot Label Classification and Localization | Jun 8, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Dec 17, 2024 | Human Detectionimage-classification | —Unverified | 0 | 0 |
| Single-Stage Visual Query Localization in Egocentric Videos | Jun 15, 2023 | object-detectionObject Detection | —Unverified | 0 | 0 |
| Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset | Sep 1, 2018 | Action RecognitionActivity Recognition | —Unverified | 0 | 0 |
| SocialGesture: Delving into Multi-person Gesture Understanding | Apr 3, 2025 | Gesture RecognitionQuestion Answering | —Unverified | 0 | 0 |
| Action Shuffling for Weakly Supervised Temporal Localization | May 10, 2021 | Action LocalizationTemporal Localization | —Unverified | 0 | 0 |
| Spatio-Temporal Attention Models for Grounded Video Captioning | Oct 17, 2016 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding | Mar 28, 2023 | Action LocalizationAction Recognition | —Unverified | 0 | 0 |