| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 | 5 |
| Language-based Audio Moment Retrieval | Sep 24, 2024 | audio moment retrievalMoment Retrieval | CodeCode Available | 3 | 5 |
| Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection | Aug 6, 2024 | audio moment retrievalHighlight Detection | CodeCode Available | 3 | 5 |
| Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Mar 14, 2024 | MambaMoment Retrieval | CodeCode Available | 3 | 5 |
| UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection | Mar 23, 2022 | DecoderHighlight Detection | CodeCode Available | 2 | 5 |
| The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Jun 26, 2024 | Action LocalizationMoment Retrieval | CodeCode Available | 2 | 5 |
| Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | Mar 24, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection | Jan 4, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | Apr 7, 2024 | Action DetectionMoment Queries | CodeCode Available | 2 | 5 |
| VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | May 22, 2024 | Dense Video CaptioningHighlight Detection | CodeCode Available | 2 | 5 |
| TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Oct 25, 2024 | EgoSchemaHallucination | CodeCode Available | 2 | 5 |
| UniVTG: Towards Unified Video-Language Temporal Grounding | Jul 31, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| Number it: Temporal Grounding Videos like Flipping Manga | Nov 15, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding | Nov 15, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval | Jul 21, 2024 | General KnowledgeHighlight Detection | CodeCode Available | 2 | 5 |
| TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis | May 2, 2023 | Moment RetrievalMotion Generation | CodeCode Available | 2 | 5 |
| A Flexible and Scalable Framework for Video Moment Search | Jan 9, 2025 | Moment RetrievalRe-Ranking | CodeCode Available | 1 | 5 |
| Joint Moment Retrieval and Highlight Detection Via Natural Language Queries | May 8, 2023 | DecoderHighlight Detection | CodeCode Available | 1 | 5 |
| CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval | Sep 21, 2021 | Corpus Video Moment RetrievalMoment Retrieval | CodeCode Available | 1 | 5 |
| Hierarchical Video-Moment Retrieval and Step-Captioning | Mar 29, 2023 | Information RetrievalMoment Retrieval | CodeCode Available | 1 | 5 |
| Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning | Jan 1, 2023 | Active LearningMoment Retrieval | CodeCode Available | 1 | 5 |
| Background-aware Moment Detection for Video Moment Retrieval | Jun 5, 2023 | Moment RetrievalNatural Language Moment Retrieval | CodeCode Available | 1 | 5 |
| Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection | Nov 28, 2023 | Contrastive LearningHighlight Detection | CodeCode Available | 1 | 5 |
| Detecting Moments and Highlights in Videos via Natural Language Queries | Dec 1, 2021 | DecoderMoment Retrieval | CodeCode Available | 1 | 5 |
| Frame-wise Cross-modal Matching for Video Moment Retrieval | Sep 22, 2020 | Boundary DetectionMoment Retrieval | CodeCode Available | 1 | 5 |