| TimeRefine: Temporal Grounding with Time Refining Video LLM | Dec 12, 2024 | Temporal Localization | CodeCode Available | 0 |
| TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability | Nov 27, 2024 | Temporal LocalizationVideo Understanding | CodeCode Available | 2 |
| Number it: Temporal Grounding Videos like Flipping Manga | Nov 15, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| Unsupervised detection and classification of heartbeats using the dissimilarity matrix in PCG signals | Nov 5, 2024 | Heart SegmentationSound Classification | —Unverified | 0 |
| Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter | Sep 28, 2024 | Temporal Localization | —Unverified | 0 |
| Training-free Video Temporal Grounding using Large-scale Pre-trained Models | Aug 29, 2024 | Temporal Localization | CodeCode Available | 1 |
| Impact of Noisy Labels on Sound Event Detection: Deletion Errors Are More Detrimental Than Insertion Errors | Aug 27, 2024 | Event DetectionSound Event Detection | —Unverified | 0 |
| Described Spatial-Temporal Video Detection | Jul 8, 2024 | Multi-class ClassificationTemporal Localization | —Unverified | 0 |
| Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time | Jul 1, 2024 | AUDIO-VISUAL QUESTION ANSWERING (MUSIC-AVQA-v2.0)Fact Checking | CodeCode Available | 1 |
| MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval | Jun 25, 2024 | cross-modal alignmentMoment Retrieval | —Unverified | 0 |
| OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding | Jun 11, 2024 | Action UnderstandingDiversity | CodeCode Available | 2 |
| LITA: Language Instructed Temporal-Localization Assistant | Mar 27, 2024 | Instruction FollowingTemporal Localization | CodeCode Available | 2 |
| Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding | Mar 24, 2024 | Dense Video CaptioningTemporal Localization | —Unverified | 0 |
| Skeleton-Based Human Action Recognition with Noisy Labels | Mar 15, 2024 | Action RecognitionDenoising | CodeCode Available | 0 |
| Density-Guided Label Smoothing for Temporal Localization of Driving Actions | Mar 11, 2024 | Action LocalizationAction Recognition | —Unverified | 0 |
| Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Mar 11, 2024 | 2D Human Pose EstimationAction Recognition | —Unverified | 0 |
| OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog | Feb 20, 2024 | ObjectObject Tracking | —Unverified | 0 |
| Semi-supervised Active Learning for Video Action Detection | Dec 12, 2023 | Action DetectionActive Learning | CodeCode Available | 0 |
| Deep-Learning-Assisted Analysis of Cataract Surgery Videos | Dec 10, 2023 | Decision MakingDeep Learning | —Unverified | 0 |
| TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding | Dec 4, 2023 | Dense CaptioningHighlight Detection | CodeCode Available | 2 |
| Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives | Sep 21, 2023 | Action LocalizationAction Recognition | —Unverified | 0 |
| Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization | Aug 24, 2023 | Action LocalizationContrastive Learning | —Unverified | 0 |
| UnLoc: A Unified Framework for Video Localization Tasks | Aug 21, 2023 | Action SegmentationMoment Retrieval | CodeCode Available | 0 |
| VideoGLUE: Video General Understanding Evaluation of Foundation Models | Jul 6, 2023 | Action RecognitionTemporal Localization | CodeCode Available | 0 |
| Dense Video Object Captioning from Disjoint Supervision | Jun 20, 2023 | ObjectSentence | CodeCode Available | 0 |