SOTAVerified

Temporal Localization

Papers

Showing 51100 of 153 papers

TitleStatusHype
SocialGesture: Delving into Multi-person Gesture Understanding0
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation DatasetCode0
Adapting to the Unknown: Training-Free Audio-Visual Event Perception with Dynamic ThresholdsCode0
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding0
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization0
Towards Fine-Grained Video Question Answering0
Weakly Supervised Multiple Instance Learning for Whale Call Detection and Temporal Localization in Long-Duration Passive Acoustic MonitoringCode0
Fusion of Millimeter-wave Radar and Pulse Oximeter Data for Low-burden Diagnosis of Obstructive Sleep Apnea-Hypopnea Syndrome0
Pseudo Strong Labels from Frame-Level Predictions for Weakly Supervised Sound Event Detection0
Do Current Video LLMs Have Strong OCR Abilities? A Preliminary StudyCode0
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries0
TimeRefine: Temporal Grounding with Time Refining Video LLMCode0
Unsupervised detection and classification of heartbeats using the dissimilarity matrix in PCG signals0
Detection of Sleep Apnea-Hypopnea Events Using Millimeter-wave Radar and Pulse Oximeter0
Impact of Noisy Labels on Sound Event Detection: Deletion Errors Are More Detrimental Than Insertion Errors0
Described Spatial-Temporal Video Detection0
MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval0
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding0
Skeleton-Based Human Action Recognition with Noisy LabelsCode0
Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition0
Density-Guided Label Smoothing for Temporal Localization of Driving Actions0
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog0
Semi-supervised Active Learning for Video Action DetectionCode0
Deep-Learning-Assisted Analysis of Cataract Surgery Videos0
Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives0
Cross-Video Contextual Knowledge Exploration and Exploitation for Ambiguity Reduction in Weakly Supervised Temporal Action Localization0
UnLoc: A Unified Framework for Video Localization TasksCode0
VideoGLUE: Video General Understanding Evaluation of Foundation ModelsCode0
Dense Video Object Captioning from Disjoint SupervisionCode0
Single-Stage Visual Query Localization in Egocentric Videos0
Autonomous Stabilization of Retinal Videos for Streamlining Assessment of Spontaneous Venous Pulsations0
Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding0
VADER: Video Alignment Differencing and Retrieval0
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022Code0
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 20220
Optimizing Temporal Resolution Of Convolutional Recurrent Neural Networks For Sound Event Detection0
Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection0
Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 20220
Team PKU-WICT-MIPL PIC Makeup Temporal Video Grounding Challenge 2022 Technical Report0
Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes0
Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 20220
TadML: A fast temporal action detection with Mechanics-MLPCode0
To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions0
Contrastive Language-Action Pre-training for Temporal Localization0
Universal Prototype Transport for Zero-Shot Action Recognition and Localization0
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in VlogsCode0
OWL (Observe, Watch, Listen): Audiovisual Temporal Context for Localizing Actions in Egocentric Videos0
A benchmark of state-of-the-art sound event detection systems evaluated on synthetic soundscapes0
Practitioner-Centric Approach for Early Incident Detection Using Crowdsourced Data for Emergency Services0
Hierarchical Deep Residual Reasoning for Temporal Moment LocalizationCode0
Show:102550
← PrevPage 2 of 4Next →

No leaderboard results yet.