| Inceptive Event Time-Surfaces for Object Classification Using Neuromorphic Cameras | Feb 26, 2020 | ClassificationDimensionality Reduction | —Unverified | 0 |
| Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences | Jan 29, 2020 | Action RecognitionAction Segmentation | —Unverified | 0 |
| Learning to track for spatio-temporal action localization | Jun 5, 2015 | Action LocalizationSpatio-Temporal Action Localization | —Unverified | 0 |
| Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization | Mar 12, 2025 | Temporal LocalizationVideo Understanding | —Unverified | 0 |
| MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval | Jun 25, 2024 | cross-modal alignmentMoment Retrieval | —Unverified | 0 |
| Modality Shifting Attention Network for Multi-modal Video Question Answering | Jul 4, 2020 | Question AnsweringTemporal Localization | —Unverified | 0 |
| Modeling Spatio-Temporal Human Track Structure for Action Localization | Jun 28, 2018 | Action LocalizationHuman Detection | —Unverified | 0 |
| Objects2action: Classifying and localizing actions without any video example | Oct 23, 2015 | AttributeObject | —Unverified | 0 |
| OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog | Feb 20, 2024 | ObjectObject Tracking | —Unverified | 0 |
| Optimizing Temporal Resolution Of Convolutional Recurrent Neural Networks For Sound Event Detection | Oct 18, 2022 | Event DetectionSound Event Detection | —Unverified | 0 |