SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 11011149 of 1149 papers

TitleStatusHype
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models0
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition0
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant0
Streaming Long Video Understanding with Large Language Models0
Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring0
Students taught by multimodal teachers are superior action recognizers0
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding0
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis0
SVGraph: Learning Semantic Graphs from Instructional Videos0
SVT: Supertoken Video Transformer for Efficient Video Understanding0
Dynamics Based Neural Encoding with Inter-Intra Region Connectivity0
System-status-aware Adaptive Network for Online Streaming Video Understanding0
TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations0
Teaching Machines to Understand Baseball Games: Large-Scale Baseball Video Database for Multiple Video Understanding Tasks0
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks0
Temporal Action Detection Model Compression by Progressive Block Drop0
Temporal Grounding of Activities using Multimodal Large Language Models0
Temporally-Adaptive Models for Efficient Video Understanding0
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection0
Temporal Preference Optimization for Long-Form Video Understanding0
Temporal Query Networks for Fine-grained Video Understanding0
t-EVA: Time-Efficient t-SNE Video Annotation0
Text-Conditioned Resampler For Long Form Video Understanding0
TextVidBench: A Benchmark for Long Video Scene Text Understanding0
The Open World of Micro-Videos0
Therbligs in Action: Video Understanding through Motion Primitives0
The THUMOS Challenge on Action Recognition for Videos "in the Wild"0
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders0
Time Blindness: Why Video-Language Models Can't See What Humans Can?0
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs0
Toward a Human-Level Video Understanding Intelligence0
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
Towards Fine-Grained Video Question Answering0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset0
Towards Long Video Understanding via Fine-detailed Video Story Generation0
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition0
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition0
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection0
Transformed ROIs for Capturing Visual Transformations in Videos0
Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images0
TVBench: Redesigning Video-Language Evaluation0
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning0
Two Causally Related Needles in a Video Haystack0
Two-Stream Transformer Architecture for Long Video Understanding0
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
Show:102550
← PrevPage 23 of 23Next →

No leaderboard results yet.