Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1101–1149 of 1149 papers

Title	Date	Tasks	Status
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models	Aug 31, 2024	Video Understanding	—Unverified
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition	Jan 8, 2023	Action RecognitionFacial Expression Recognition (FER)	—Unverified
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant	May 8, 2025	Language ModelingLanguage Modelling	—Unverified
Streaming Long Video Understanding with Large Language Models	May 25, 2024	Question AnsweringVideo Understanding	—Unverified
Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring	Aug 31, 2024	Disaster ResponseVideo Understanding	—Unverified
Students taught by multimodal teachers are superior action recognizers	Oct 9, 2022	Action RecognitionKnowledge Distillation	—Unverified
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding	Jun 9, 2025	Contrastive LearningVideo Editing	—Unverified
SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis	Jun 9, 2025	Action ClassificationBenchmarking	—Unverified
SVGraph: Learning Semantic Graphs from Instructional Videos	Jul 16, 2022	Graph LearningVideo Understanding	—Unverified
SVT: Supertoken Video Transformer for Efficient Video Understanding	Apr 1, 2023	Video Understanding	—Unverified
Dynamics Based Neural Encoding with Inter-Intra Region Connectivity	Feb 19, 2024	Video Understanding	—Unverified
System-status-aware Adaptive Network for Online Streaming Video Understanding	Mar 28, 2023	Streaming video understandingVideo Understanding	—Unverified
TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations	Sep 5, 2024	Causal InferencePosition	—Unverified
Teaching Machines to Understand Baseball Games: Large-Scale Baseball Video Database for Multiple Video Understanding Tasks	Sep 1, 2018	Video AlignmentVideo Recognition	—Unverified
Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks	Sep 27, 2024	Action DetectionAction Segmentation	—Unverified
Temporal Action Detection Model Compression by Progressive Block Drop	Mar 21, 2025	Action DetectionAutonomous Driving	—Unverified
Temporal Grounding of Activities using Multimodal Large Language Models	May 30, 2024	Video Understanding	—Unverified
Temporally-Adaptive Models for Efficient Video Understanding	Aug 10, 2023	Action ClassificationAction Recognition	—Unverified
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection	Mar 1, 2022	AvgBoundary Detection	—Unverified
Temporal Preference Optimization for Long-Form Video Understanding	Jan 23, 2025	FormMME	—Unverified
Temporal Query Networks for Fine-grained Video Understanding	Apr 19, 2021	Action ClassificationAction Recognition	—Unverified
t-EVA: Time-Efficient t-SNE Video Annotation	Nov 26, 2020	Dimensionality ReductionVideo Classification	—Unverified
Text-Conditioned Resampler For Long Form Video Understanding	Dec 19, 2023	EgoSchemaForm	—Unverified
TextVidBench: A Benchmark for Long Video Scene Text Understanding	Jun 5, 2025	Prompt EngineeringQuestion Answering	—Unverified
The Open World of Micro-Videos	Mar 31, 2016	DiversityTAG	—Unverified
Therbligs in Action: Video Understanding through Motion Primitives	Apr 6, 2023	Action AnticipationAction Recognition	—Unverified
The THUMOS Challenge on Action Recognition for Videos "in the Wild"	Apr 21, 2016	Action ClassificationAction Recognition	—Unverified
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders	May 30, 2025	Video Understanding	—Unverified
Time Blindness: Why Video-Language Models Can't See What Humans Can?	May 30, 2025	Temporal SequencesVideo Understanding	—Unverified
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding	Apr 2, 2025	Video Understanding	—Unverified
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Apr 24, 2025	Caption GenerationDense Video Captioning	—Unverified
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs	Mar 13, 2025	BenchmarkingQuestion Answering	—Unverified
Toward a Human-Level Video Understanding Intelligence	Oct 8, 2021	AI AgentVideo Understanding	—Unverified
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder	Sep 20, 2024	Activity RecognitionDiagnostic	—Unverified
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified
Towards Fine-Grained Video Question Answering	Mar 10, 2025	Language ModelingLanguage Modelling	—Unverified
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset	Jun 19, 2024	Language ModelingLanguage Modelling	—Unverified
Towards Long Video Understanding via Fine-detailed Video Story Generation	Dec 9, 2024	Story GenerationVideo Understanding	—Unverified
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition	Mar 17, 2025	Action RecognitionVideo Recognition	—Unverified
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition	Jun 9, 2021	Action RecognitionPoint Cloud Classification	—Unverified
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection	Mar 5, 2025	Anomaly DetectionObject	—Unverified
Transformed ROIs for Capturing Visual Transformations in Videos	Jun 6, 2021	Action RecognitionVideo Understanding	—Unverified
Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images	Dec 7, 2022	Building change detection for remote sensing imagesChange Detection	—Unverified
TVBench: Redesigning Video-Language Evaluation	Oct 10, 2024	Multiple-choiceOpen-Ended Question Answering	—Unverified
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning	Feb 29, 2024	Question AnsweringVideo Understanding	—Unverified
Two Causally Related Needles in a Video Haystack	May 26, 2025	Video UnderstandingVisual Grounding	—Unverified
Two-Stream Transformer Architecture for Long Video Understanding	Aug 2, 2022	Action RecognitionGPU	—Unverified
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Nov 29, 2021	Boundary DetectionContrastive Learning	—Unverified
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Jan 1, 2022	Boundary DetectionContrastive Learning	—Unverified

Show:10 25 50

← PrevPage 23 of 23Next →

No leaderboard results yet.