SOTAVerified

Temporal Localization

Papers

Showing 150 of 153 papers

TitleStatusHype
VideoMind: A Chain-of-LoRA Agent for Long Video ReasoningCode3
Number it: Temporal Grounding Videos like Flipping MangaCode2
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal UnderstandingCode2
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization AbilityCode2
Egocentric Video-Language PretrainingCode2
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
LITA: Language Instructed Temporal-Localization AssistantCode2
MINERVA: Evaluating Complex Video ReasoningCode2
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow UnderstandingCode2
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit CooperationCode2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
Stargazer: A transformer-based driver action detection system for intelligent transportationCode1
Weakly Supervised Temporal Action Localization Using Deep Metric LearningCode1
Self-Chained Image-Language Model for Video Localization and Question AnsweringCode1
Video Moment Localization using Object Evidence and Reverse CaptioningCode1
OpenTAL: Towards Open Set Temporal Action LocalizationCode1
Unsupervised Pre-training for Temporal Action Localization TasksCode1
VLG-Net: Video-Language Graph Matching Network for Video GroundingCode1
Multi-Task Learning of Object State Changes from Uncurated VideosCode1
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization TasksCode1
Few-Shot Temporal Action Localization with Query Adaptive TransformerCode1
Training-free Video Temporal Grounding using Large-scale Pre-trained ModelsCode1
Meerkat: Audio-Visual Large Language Model for Grounding in Space and TimeCode1
Human-centric Spatio-Temporal Video Grounding With Visual TransformersCode1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQACode1
TubeDETR: Spatio-Temporal Video Grounding with TransformersCode1
Weakly Supervised Action Selection Learning in VideoCode1
Enriching Local and Global Contexts for Temporal Action LocalizationCode1
End-to-End Semi-Supervised Learning for Video Action DetectionCode1
Finding Moments in Video Collections Using Natural LanguageCode1
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video GroundingCode1
Audio-Visual Event Localization in Unconstrained VideosCode1
Learning Salient Boundary Feature for Anchor-free Temporal Action LocalizationCode1
DisTime: Distribution-based Time Representation for Video Large Language ModelsCode1
LocVTP: Video-Text Pre-training for Temporal LocalizationCode1
TALL: Temporal Activity Localization via Language QueryCode1
Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in VideosCode1
Boundary-sensitive Pre-training for Temporal Localization in VideosCode1
MAC: Mining Activity Concepts for Language-based Temporal LocalizationCode1
Temporally Precise Action Spotting in Soccer Videos Using Dense Detection AnchorsCode1
CityFlow-NL: Tracking and Retrieval of Vehicles at City Scale by Natural Language DescriptionsCode1
FineAction: A Fine-Grained Video Dataset for Temporal Action LocalizationCode1
Unsupervised classification to improve the quality of a bird song recording datasetCode1
Skeleton-Based Human Action Recognition with Noisy LabelsCode0
Asynchronous Temporal Fields for Action RecognitionCode0
SoftLoc: Robust Temporal Localization under Label MisalignmentCode0
Am I Done? Predicting Action Progress in VideosCode0
Dense Video Object Captioning from Disjoint SupervisionCode0
Semi-supervised Active Learning for Video Action DetectionCode0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.