| Egocentric Video-Language Pretraining | Jun 3, 2022 | Action RecognitionContrastive Learning | CodeCode Available | 2 |
| DisTime: Distribution-based Time Representation for Video Large Language Models | May 30, 2025 | Temporal LocalizationVideo Understanding | CodeCode Available | 1 |
| TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Mar 9, 2025 | Action LocalizationBoundary Detection | CodeCode Available | 1 |
| Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding | Feb 16, 2025 | AttributeObject | CodeCode Available | 1 |
| Training-free Video Temporal Grounding using Large-scale Pre-trained Models | Aug 29, 2024 | Temporal Localization | CodeCode Available | 1 |
| Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time | Jul 1, 2024 | AUDIO-VISUAL QUESTION ANSWERING (MUSIC-AVQA-v2.0)Fact Checking | CodeCode Available | 1 |
| Self-Chained Image-Language Model for Video Localization and Question Answering | May 11, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Unsupervised classification to improve the quality of a bird song recording dataset | Feb 15, 2023 | Sound ClassificationTemporal Localization | CodeCode Available | 1 |
| Multi-Task Learning of Object State Changes from Uncurated Videos | Nov 24, 2022 | Multi-Task LearningObject | CodeCode Available | 1 |
| LocVTP: Video-Text Pre-training for Temporal Localization | Jul 21, 2022 | RetrievalTemporal Localization | CodeCode Available | 1 |