| MINERVA: Evaluating Complex Video Reasoning | May 1, 2025 | BenchmarkingTemporal Localization | CodeCode Available | 2 | 5 |
| Enriching Local and Global Contexts for Temporal Action Localization | Jul 27, 2021 | Action ClassificationAction Localization | CodeCode Available | 1 | 5 |
| LocVTP: Video-Text Pre-training for Temporal Localization | Jul 21, 2022 | RetrievalTemporal Localization | CodeCode Available | 1 | 5 |
| End-to-End Semi-Supervised Learning for Video Action Detection | Mar 8, 2022 | Action DetectionClassification Consistency | CodeCode Available | 1 | 5 |
| Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos | Jan 25, 2022 | Natural Language QueriesSentence | CodeCode Available | 1 | 5 |
| MAC: Mining Activity Concepts for Language-based Temporal Localization | Nov 21, 2018 | Language-Based Temporal LocalizationTemporal Localization | CodeCode Available | 1 | 5 |
| Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding | Feb 16, 2025 | AttributeObject | CodeCode Available | 1 | 5 |
| Boundary-sensitive Pre-training for Temporal Localization in Videos | Nov 21, 2020 | Action ClassificationClassification | CodeCode Available | 1 | 5 |
| DisTime: Distribution-based Time Representation for Video Large Language Models | May 30, 2025 | Temporal LocalizationVideo Understanding | CodeCode Available | 1 | 5 |
| Learning Salient Boundary Feature for Anchor-free Temporal Action Localization | Mar 24, 2021 | Action LocalizationTemporal Action Localization | CodeCode Available | 1 | 5 |