Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 1149 papers

Title	Date	Tasks	Status	Hype
Leveraging Temporal Contextualization for Video Action Recognition	Apr 15, 2024	Action RecognitionTemporal Action Localization	CodeCode Available	2
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection	Mar 24, 2023	Highlight DetectionMoment Retrieval	CodeCode Available	2
OmniVid: A Generative Framework for Universal Video Understanding	Mar 26, 2024	Action RecognitionDecoder	CodeCode Available	2
Neptune: The Long Orbit to Benchmarking Long Video Understanding	Dec 12, 2024	BenchmarkingMultimodal Reasoning	CodeCode Available	2
Omni-Video: Democratizing Unified Video Understanding and Generation	Jul 8, 2025	Video GenerationVideo Understanding	CodeCode Available	2
Multi-granularity Correspondence Learning from Long-term Noisy Videos	Jan 30, 2024	Action SegmentationLong Video Retrieval (Background Removed)	CodeCode Available	2
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding	Jan 21, 2025	Video Understanding	CodeCode Available	2
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model	Mar 27, 2025	EgoSchemaLanguage Modeling	CodeCode Available	2
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory	May 29, 2025	Contrastive LearningText Retrieval	CodeCode Available	2
Dense Connector for MLLMs	May 22, 2024	Video Understanding	CodeCode Available	2
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Jul 31, 2023	Multiple-choiceQuestion Answering	CodeCode Available	2
LVBench: An Extreme Long Video Understanding Benchmark	Jun 12, 2024	Decision MakingVideo Understanding	CodeCode Available	2
LongVLM: Efficient Long Video Understanding via Large Language Models	Apr 4, 2024	Question AnsweringVideo Question Answering	CodeCode Available	2
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer	Jun 24, 2024	AI AgentLarge Language Model	CodeCode Available	2
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark	May 30, 2024	DeepFake DetectionMamba	CodeCode Available	2
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos	Nov 29, 2024	Boundary DetectionDense Video Captioning	CodeCode Available	2
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding	Jul 22, 2024	Multiple-choiceQuestion Answering	CodeCode Available	2
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives	Nov 30, 2023	Video Understanding	CodeCode Available	2
Online Video Understanding: OVBench and VideoChat-Online	Dec 31, 2024	Autonomous DrivingQuestion Answering	CodeCode Available	2
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models	Nov 22, 2023	BenchmarkingPhrase Grounding	CodeCode Available	2
TRACE: Temporal Grounding Video LLM via Causal Event Modeling	Oct 8, 2024	Text GenerationVideo Understanding	CodeCode Available	2
Boosting Single Image Super-Resolution via Partial Channel Shifting	Jan 1, 2023	DiversityImage Super-Resolution	CodeCode Available	1
Leveraging triplet loss for unsupervised action segmentation	Apr 13, 2023	Action SegmentationClustering	CodeCode Available	1

Show:10 25 50

← PrevPage 6 of 46Next →

No leaderboard results yet.