| InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | Mar 22, 2024 | Action ClassificationAction Recognition | CodeCode Available | 7 | 5 |
| Language-based Audio Moment Retrieval | Sep 24, 2024 | audio moment retrievalMoment Retrieval | CodeCode Available | 3 | 5 |
| Lighthouse: A User-Friendly Library for Reproducible Video Moment Retrieval and Highlight Detection | Aug 6, 2024 | audio moment retrievalHighlight Detection | CodeCode Available | 3 | 5 |
| Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Mar 14, 2024 | MambaMoment Retrieval | CodeCode Available | 3 | 5 |
| UniVTG: Towards Unified Video-Language Temporal Grounding | Jul 31, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval | Jul 21, 2024 | General KnowledgeHighlight Detection | CodeCode Available | 2 | 5 |
| TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis | May 2, 2023 | Moment RetrievalMotion Generation | CodeCode Available | 2 | 5 |
| VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | May 22, 2024 | Dense Video CaptioningHighlight Detection | CodeCode Available | 2 | 5 |
| Number it: Temporal Grounding Videos like Flipping Manga | Nov 15, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding | Nov 15, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval | Jun 26, 2024 | Action LocalizationMoment Retrieval | CodeCode Available | 2 | 5 |
| UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection | Mar 23, 2022 | DecoderHighlight Detection | CodeCode Available | 2 | 5 |
| TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection | Jan 4, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | Apr 7, 2024 | Action DetectionMoment Queries | CodeCode Available | 2 | 5 |
| TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | Oct 25, 2024 | EgoSchemaHallucination | CodeCode Available | 2 | 5 |
| Query-Dependent Video Representation for Moment Retrieval and Highlight Detection | Mar 24, 2023 | Highlight DetectionMoment Retrieval | CodeCode Available | 2 | 5 |
| VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval | Dec 2, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 1 | 5 |
| A Flexible and Scalable Framework for Video Moment Search | Jan 9, 2025 | Moment RetrievalRe-Ranking | CodeCode Available | 1 | 5 |
| Selective Query-guided Debiasing for Video Corpus Moment Retrieval | Oct 17, 2022 | Moment RetrievalRetrieval | CodeCode Available | 1 | 5 |
| CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval | Sep 21, 2021 | Corpus Video Moment RetrievalMoment Retrieval | CodeCode Available | 1 | 5 |
| Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning | Jan 1, 2023 | Active LearningMoment Retrieval | CodeCode Available | 1 | 5 |
| Uncovering Hidden Challenges in Query-Based Video Moment Retrieval | Sep 1, 2020 | Moment RetrievalRetrieval | CodeCode Available | 1 | 5 |
| VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding | Oct 11, 2024 | HallucinationMoment Retrieval | CodeCode Available | 1 | 5 |
| Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos | Aug 19, 2020 | Moment RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Video Moment Retrieval from Text Queries via Single Frame Annotation | Apr 20, 2022 | Contrastive LearningMoment Retrieval | CodeCode Available | 1 | 5 |
| Background-aware Moment Detection for Video Moment Retrieval | Jun 5, 2023 | Moment RetrievalNatural Language Moment Retrieval | CodeCode Available | 1 | 5 |
| Detecting Moments and Highlights in Videos via Natural Language Queries | Dec 1, 2021 | DecoderMoment Retrieval | CodeCode Available | 1 | 5 |
| Saliency-Guided DETR for Moment Retrieval and Highlight Detection | Oct 2, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 1 | 5 |
| Partially Relevant Video Retrieval | Aug 26, 2022 | Moment RetrievalMultiple Instance Learning | CodeCode Available | 1 | 5 |
| Video Corpus Moment Retrieval with Contrastive Learning | May 13, 2021 | Contrastive LearningMoment Retrieval | CodeCode Available | 1 | 5 |
| HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training | May 1, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Deconfounded Video Moment Retrieval with Causal Intervention | Jun 3, 2021 | Moment RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval | Dec 19, 2023 | cross-modal alignmentMoment Retrieval | CodeCode Available | 1 | 5 |
| TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval | Jan 24, 2020 | Moment RetrievalRetrieval | CodeCode Available | 1 | 5 |
| Finding Moments in Video Collections Using Natural Language | Jul 30, 2019 | Moment RetrievalRe-Ranking | CodeCode Available | 1 | 5 |
| MomentDiff: Generative Video Moment Retrieval from Random to Real | Jul 6, 2023 | Moment RetrievalRetrieval | CodeCode Available | 1 | 5 |
| MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions | Dec 1, 2021 | Moment RetrievalNatural Language Moment Retrieval | CodeCode Available | 1 | 5 |
| BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos | Nov 30, 2023 | Moment RetrievalNatural Language Moment Retrieval | CodeCode Available | 1 | 5 |
| MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions | Apr 21, 2024 | Moment RetrievalSentence | CodeCode Available | 1 | 5 |
| MTVR: Multilingual Moment Retrieval in Videos | Jul 30, 2021 | Moment RetrievalRetrieval | CodeCode Available | 1 | 5 |
| TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos | Mar 9, 2025 | Action LocalizationBoundary Detection | CodeCode Available | 1 | 5 |
| Hierarchical Video-Moment Retrieval and Step-Captioning | Mar 29, 2023 | Information RetrievalMoment Retrieval | CodeCode Available | 1 | 5 |
| Frame-wise Cross-modal Matching for Video Moment Retrieval | Sep 22, 2020 | Boundary DetectionMoment Retrieval | CodeCode Available | 1 | 5 |
| FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding | Dec 18, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 1 | 5 |
| Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection | Nov 28, 2023 | Contrastive LearningHighlight Detection | CodeCode Available | 1 | 5 |
| LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection | Jan 18, 2025 | Contrastive LearningDecoder | CodeCode Available | 1 | 5 |
| Joint Moment Retrieval and Highlight Detection Via Natural Language Queries | May 8, 2023 | DecoderHighlight Detection | CodeCode Available | 1 | 5 |
| Length-Aware DETR for Robust Moment Retrieval | Dec 30, 2024 | Information RetrievalMoment Retrieval | CodeCode Available | 1 | 5 |
| QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries | Jul 20, 2021 | Highlight DetectionMoment Retrieval | CodeCode Available | 1 | 5 |
| Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection | Apr 14, 2024 | Highlight DetectionMoment Retrieval | CodeCode Available | 1 | 5 |