| SocialGesture: Delving into Multi-person Gesture Understanding | Apr 3, 2025 | Gesture RecognitionQuestion Answering | —Unverified | 0 |
| ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset | Mar 24, 2025 | Activity RecognitionTemporal Localization | CodeCode Available | 0 |
| Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic Thresholds | Mar 17, 2025 | Temporal Localization | CodeCode Available | 0 |
| Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding | Mar 14, 2025 | DenoisingDense Video Captioning | —Unverified | 0 |
| Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization | Mar 12, 2025 | Temporal LocalizationVideo Understanding | —Unverified | 0 |
| Towards Fine-Grained Video Question Answering | Mar 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic Monitoring | Feb 28, 2025 | Multiple Instance LearningTemporal Localization | CodeCode Available | 0 |
| Fusion of Millimeter-wave Radar and Pulse Oximeter Data for Low-burden Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome | Jan 25, 2025 | DiagnosticSleep Staging | —Unverified | 0 |
| Pseudo Strong Labels from Frame-Level Predictions for Weakly Supervised Sound Event Detection | Jan 7, 2025 | Event DetectionSound Event Detection | —Unverified | 0 |
| Do Current Video LLMs Have Strong OCR Abilities? A Preliminary Study | Dec 29, 2024 | Motion DetectionOptical Character Recognition | CodeCode Available | 0 |
| ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries | Dec 17, 2024 | Human Detectionimage-classification | —Unverified | 0 |
| TimeRefine: Temporal Grounding with Time Refining Video LLM | Dec 12, 2024 | Temporal Localization | CodeCode Available | 0 |
| Unsupervised detection and classification of heartbeats using the dissimilarity matrix in PCG signals | Nov 5, 2024 | Heart SegmentationSound Classification | —Unverified | 0 |
| Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter | Sep 28, 2024 | Temporal Localization | —Unverified | 0 |
| Impact of Noisy Labels on Sound Event Detection: Deletion Errors Are More Detrimental Than Insertion Errors | Aug 27, 2024 | Event DetectionSound Event Detection | —Unverified | 0 |
| Described Spatial-Temporal Video Detection | Jul 8, 2024 | Multi-class ClassificationTemporal Localization | —Unverified | 0 |
| MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval | Jun 25, 2024 | cross-modal alignmentMoment Retrieval | —Unverified | 0 |
| Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding | Mar 24, 2024 | Dense Video CaptioningTemporal Localization | —Unverified | 0 |
| Skeleton-Based Human Action Recognition with Noisy Labels | Mar 15, 2024 | Action RecognitionDenoising | CodeCode Available | 0 |
| Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Mar 11, 2024 | 2D Human Pose EstimationAction Recognition | —Unverified | 0 |
| Density-Guided Label Smoothing for Temporal Localization of Driving Actions | Mar 11, 2024 | Action LocalizationAction Recognition | —Unverified | 0 |
| OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog | Feb 20, 2024 | ObjectObject Tracking | —Unverified | 0 |
| Semi-supervised Active Learning for Video Action Detection | Dec 12, 2023 | Action DetectionActive Learning | CodeCode Available | 0 |
| Deep-Learning-Assisted Analysis of Cataract Surgery Videos | Dec 10, 2023 | Decision MakingDeep Learning | —Unverified | 0 |
| Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives | Sep 21, 2023 | Action LocalizationAction Recognition | —Unverified | 0 |
| Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization | Aug 24, 2023 | Action LocalizationContrastive Learning | —Unverified | 0 |
| UnLoc: A Unified Framework for Video Localization Tasks | Aug 21, 2023 | Action SegmentationMoment Retrieval | CodeCode Available | 0 |
| VideoGLUE: Video General Understanding Evaluation of Foundation Models | Jul 6, 2023 | Action RecognitionTemporal Localization | CodeCode Available | 0 |
| Dense Video Object Captioning from Disjoint Supervision | Jun 20, 2023 | ObjectSentence | CodeCode Available | 0 |
| Single-Stage Visual Query Localization in Egocentric Videos | Jun 15, 2023 | object-detectionObject Detection | —Unverified | 0 |
| Autonomous Stabilization of Retinal Videos for Streamlining Assessment of Spontaneous Venous Pulsations | May 10, 2023 | Template MatchingTemporal Localization | —Unverified | 0 |
| Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding | Mar 28, 2023 | Action LocalizationAction Recognition | —Unverified | 0 |
| VADER: Video Alignment Differencing and Retrieval | Mar 23, 2023 | MisinformationRetrieval | —Unverified | 0 |
| Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022 | Nov 18, 2022 | Object State Change ClassificationTemporal Localization | CodeCode Available | 0 |
| Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022 | Nov 16, 2022 | Human-Object Interaction DetectionObject | —Unverified | 0 |
| Optimizing Temporal Resolution Of Convolutional Recurrent Neural Networks For Sound Event Detection | Oct 18, 2022 | Event DetectionSound Event Detection | —Unverified | 0 |
| Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection | Sep 26, 2022 | Audio TaggingEvent Detection | —Unverified | 0 |
| Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022 | Jul 22, 2022 | ObjectObject State Change Classification | —Unverified | 0 |
| Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022 Technical Report | Jul 6, 2022 | SentenceTemporal Localization | —Unverified | 0 |
| Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes | Jun 16, 2022 | Temporal Localization | —Unverified | 0 |
| Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022 | Jun 15, 2022 | Point- of-no-return (PNR) temporal localizationTemporal Localization | —Unverified | 0 |
| TadML: A fast temporal action detection with Mechanics-MLP | Jun 7, 2022 | Action DetectionOptical Flow Estimation | CodeCode Available | 0 |
| To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions | May 29, 2022 | Boundary DetectionTemporal Localization | —Unverified | 0 |
| Contrastive Language-Action Pre-training for Temporal Localization | Apr 26, 2022 | Action LocalizationContrastive Learning | —Unverified | 0 |
| Universal Prototype Transport for Zero-Shot Action Recognition and Localization | Mar 8, 2022 | Action RecognitionObject | —Unverified | 0 |
| When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs | Feb 16, 2022 | Action LocalizationTemporal Action Localization | CodeCode Available | 0 |
| OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos | Feb 10, 2022 | Action LocalizationTemporal Action Localization | —Unverified | 0 |
| A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes | Feb 3, 2022 | Data AugmentationEvent Detection | —Unverified | 0 |
| Practitioner-Centric Approach for Early Incident Detection Using Crowdsourced Data for Emergency Services | Dec 3, 2021 | Event DetectionManagement | —Unverified | 0 |
| Hierarchical Deep Residual Reasoning for Temporal Moment Localization | Oct 31, 2021 | Language-Based Temporal LocalizationSentence | CodeCode Available | 0 |