| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 |
| Language-based Audio Moment Retrieval | Sep 24, 2024 | audio moment retrievalMoment Retrieval | CodeCode Available | 3 |
| Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection | Aug 6, 2024 | audio moment retrievalHighlight Detection | CodeCode Available | 3 |
| Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Mar 14, 2024 | MambaMoment Retrieval | CodeCode Available | 3 |
| Number it: Temporal Grounding Videos like Flipping Manga | Nov 15, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Oct 25, 2024 | EgoSchemaHallucination | CodeCode Available | 2 |
| Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval | Jul 21, 2024 | General KnowledgeHighlight Detection | CodeCode Available | 2 |
| The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Jun 26, 2024 | Action LocalizationMoment Retrieval | CodeCode Available | 2 |
| VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | May 22, 2024 | Dense Video CaptioningHighlight Detection | CodeCode Available | 2 |
| UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | Apr 7, 2024 | Action DetectionMoment Queries | CodeCode Available | 2 |
| TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection | Jan 4, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding | Nov 15, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| UniVTG: Towards Unified Video-Language Temporal Grounding | Jul 31, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis | May 2, 2023 | Moment RetrievalMotion Generation | CodeCode Available | 2 |
| Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | Mar 24, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 |
| UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection | Mar 23, 2022 | DecoderHighlight Detection | CodeCode Available | 2 |
| TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Mar 9, 2025 | Action LocalizationBoundary Detection | CodeCode Available | 1 |
| LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection | Jan 18, 2025 | Contrastive LearningDecoder | CodeCode Available | 1 |
| A Flexible and Scalable Framework for Video Moment Search | Jan 9, 2025 | Moment RetrievalRe-Ranking | CodeCode Available | 1 |
| Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection | Jan 5, 2025 | Contrastive LearningHighlight Detection | CodeCode Available | 1 |
| Length-Aware DETR for Robust Moment Retrieval | Dec 30, 2024 | Information RetrievalMoment Retrieval | CodeCode Available | 1 |
| FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding | Dec 18, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 1 |
| VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval | Dec 2, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 1 |
| Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild | Dec 1, 2024 | Moment RetrievalRetrieval | CodeCode Available | 1 |
| VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding | Oct 11, 2024 | HallucinationMoment Retrieval | CodeCode Available | 1 |