| End-to-End Semi-Supervised Learning for Video Action Detection | Mar 8, 2022 | Action DetectionClassification Consistency | CodeCode Available | 1 |
| Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos | Jan 25, 2022 | Natural Language QueriesSentence | CodeCode Available | 1 |
| Few-Shot Temporal Action Localization with Query Adaptive Transformer | Oct 20, 2021 | Action LocalizationAction Segmentation | CodeCode Available | 1 |
| Enriching Local and Global Contexts for Temporal Action Localization | Jul 27, 2021 | Action ClassificationAction Localization | CodeCode Available | 1 |
| FineAction: A Fine-Grained Video Dataset for Temporal Action Localization | May 24, 2021 | Action DetectionAction Localization | CodeCode Available | 1 |
| Weakly Supervised Action Selection Learning in Video | May 6, 2021 | Temporal LocalizationWeakly Supervised Action Localization | CodeCode Available | 1 |
| Learning Salient Boundary Feature for Anchor-free Temporal Action Localization | Mar 24, 2021 | Action LocalizationTemporal Action Localization | CodeCode Available | 1 |
| CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language Descriptions | Jan 12, 2021 | Multi-Object TrackingObject Tracking | CodeCode Available | 1 |
| TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks | Nov 23, 2020 | Action ClassificationAction Localization | CodeCode Available | 1 |
| Boundary-sensitive Pre-training for Temporal Localization in Videos | Nov 21, 2020 | Action ClassificationClassification | CodeCode Available | 1 |
| VLG-Net: Video-Language Graph Matching Network for Video Grounding | Nov 19, 2020 | Graph MatchingMoment Retrieval | CodeCode Available | 1 |
| Human-centric Spatio-Temporal Video Grounding With Visual Transformers | Nov 10, 2020 | Referring ExpressionSentence | CodeCode Available | 1 |
| Video Moment Localization using Object Evidence and Reverse Captioning | Jun 18, 2020 | Language-Based Temporal LocalizationLanguage Modelling | CodeCode Available | 1 |
| Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA | May 13, 2020 | Image CaptioningMulti-Label Classification | CodeCode Available | 1 |
| Weakly Supervised Temporal Action Localization Using Deep Metric Learning | Jan 21, 2020 | Action LocalizationMetric Learning | CodeCode Available | 1 |
| Finding Moments in Video Collections Using Natural Language | Jul 30, 2019 | Moment RetrievalRe-Ranking | CodeCode Available | 1 |
| MAC: Mining Activity Concepts for Language-based Temporal Localization | Nov 21, 2018 | Language-Based Temporal LocalizationTemporal Localization | CodeCode Available | 1 |
| Audio-Visual Event Localization in Unconstrained Videos | Mar 23, 2018 | audio-visual event localizationTemporal Localization | CodeCode Available | 1 |
| TALL: Temporal Activity Localization via Language Query | May 5, 2017 | Natural Language Queriesregression | CodeCode Available | 1 |
| Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements | Jun 11, 2025 | Temporal Localization | —Unverified | 0 |
| Transforming faces into video stories -- VideoFace2.0 | May 4, 2025 | Face DetectionFace Recognition | CodeCode Available | 0 |
| TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation | Apr 24, 2025 | Caption GenerationDense Video Captioning | —Unverified | 0 |
| Hierarchical and Multimodal Data for Daily Activity Understanding | Apr 24, 2025 | Action Anticipationcounterfactual | CodeCode Available | 0 |
| A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports | Apr 15, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Crash Time Matters: HybridMamba for Fine-Grained Temporal Localization in Traffic Surveillance Footage | Apr 4, 2025 | Temporal Localization | —Unverified | 0 |