| What do I Annotate Next? An Empirical Study of Active Learning for Action Localization | Sep 1, 2018 | Action LocalizationActive Learning | —Unverified | 0 | 0 |
| TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation | Apr 24, 2025 | Caption GenerationDense Video Captioning | —Unverified | 0 | 0 |
| To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions | May 29, 2022 | Boundary DetectionTemporal Localization | —Unverified | 0 | 0 |
| To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression | Apr 19, 2018 | regressionSentence | —Unverified | 0 | 0 |
| Towards Fine-Grained Video Question Answering | Mar 10, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding | Mar 24, 2024 | Dense Video CaptioningTemporal Localization | —Unverified | 0 | 0 |
| Transductive Universal Transport for Zero-Shot Action Recognition | Sep 29, 2021 | Action RecognitionObject | —Unverified | 0 | 0 |
| Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Mar 11, 2024 | 2D Human Pose EstimationAction Recognition | —Unverified | 0 | 0 |
| A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes | Feb 3, 2022 | Data AugmentationEvent Detection | —Unverified | 0 | 0 |
| Autonomous Stabilization of Retinal Videos for Streamlining Assessment of Spontaneous Venous Pulsations | May 10, 2023 | Template MatchingTemporal Localization | —Unverified | 0 | 0 |