SOTAVerified

Temporal Localization

Papers

Showing 150 of 153 papers

TitleStatusHype
Fine-Tuning Large Audio-Language Models with LoRA for Precise Temporal Localization of Prolonged Exposure Therapy Elements0
VideoMolmo: Spatio-Temporal Grounding Meets PointingCode2
DisTime: Distribution-based Time Representation for Video Large Language ModelsCode1
Transforming faces into video stories -- VideoFace2.0Code0
MINERVA: Evaluating Complex Video ReasoningCode2
Hierarchical and Multimodal Data for Daily Activity UnderstandingCode0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
A Large-Language Model Framework for Relative Timeline Extraction from PubMed Case Reports0
Crash Time Matters: HybridMamba for Fine-Grained Temporal Localization in Traffic Surveillance Footage0
SocialGesture: Delving into Multi-person Gesture Understanding0
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation DatasetCode0
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit CooperationCode2
VideoMind: A Chain-of-LoRA Agent for Long Video ReasoningCode3
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic ThresholdsCode0
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding0
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization0
Towards Fine-Grained Video Question Answering0
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic MonitoringCode0
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video GroundingCode1
Fusion of Millimeter-wave Radar and Pulse Oximeter Data for Low-burden Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome0
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal UnderstandingCode2
Pseudo Strong Labels from Frame-Level Predictions for Weakly Supervised Sound Event Detection0
Do Current Video LLMs Have Strong OCR Abilities? A Preliminary StudyCode0
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries0
TimeRefine: Temporal Grounding with Time Refining Video LLMCode0
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization AbilityCode2
Number it: Temporal Grounding Videos like Flipping MangaCode2
Unsupervised detection and classification of heartbeats using the dissimilarity matrix in PCG signals0
Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter0
Training-free Video Temporal Grounding using Large-scale Pre-trained ModelsCode1
Impact of Noisy Labels on Sound Event Detection: Deletion Errors Are More Detrimental Than Insertion Errors0
Described Spatial-Temporal Video Detection0
Meerkat: Audio-Visual Large Language Model for Grounding in Space and TimeCode1
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval0
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow UnderstandingCode2
LITA: Language Instructed Temporal-Localization AssistantCode2
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding0
Skeleton-Based Human Action Recognition with Noisy LabelsCode0
Density-Guided Label Smoothing for Temporal Localization of Driving Actions0
Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition0
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog0
Semi-supervised Active Learning for Video Action DetectionCode0
Deep-Learning-Assisted Analysis of Cataract Surgery Videos0
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives0
Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization0
UnLoc: A Unified Framework for Video Localization TasksCode0
VideoGLUE: Video General Understanding Evaluation of Foundation ModelsCode0
Dense Video Object Captioning from Disjoint SupervisionCode0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.