Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1050 of 1149 papers

Title	Date	Tasks	Status
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?	Mar 20, 2025	DecoderGraph Generation	—Unverified
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets	Jun 1, 2018	Video Understanding	—Unverified
When Work Matters: Transforming Classical Network Structures to Graph CNN	Jul 7, 2018	Graph ClassificationVideo Understanding	—Unverified
WildQA: In-the-Wild Video Question Answering	Sep 14, 2022	Evidence SelectionQuestion Answering	—Unverified
Wolf: Captioning Everything with a World Summarization Framework	Jul 26, 2024	Autonomous DrivingMixture-of-Experts	—Unverified
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	May 6, 2024	Multiple-choiceVideo Understanding	—Unverified
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs	Feb 6, 2025	Video Understanding	—Unverified
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Jan 12, 2025	Video Understanding	—Unverified
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset	Jan 1, 2022	ManagementSegmentation	—Unverified
YouTube-8M Video Understanding Challenge Approach and Applications	Jun 26, 2017	Ensemble LearningVideo Understanding	—Unverified
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection	Nov 1, 2023	Action DetectionClassification	—Unverified
Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Oct 18, 2024	Action LocalizationLanguage Modelling	—Unverified
Zero-Shot Action Recognition in Surveillance Videos	Oct 28, 2024	Action RecognitionVideo Understanding	—Unverified
Zero-Shot Action Recognition in Videos: A Survey	Sep 13, 2019	Action RecognitionAction Recognition In Still Images	—Unverified
Zero-Shot Long-Form Video Understanding through Screenplay	Jun 25, 2024	FormQuestion Answering	—Unverified
Zero-shot Shark Tracking and Biometrics from Aerial Imagery	Jan 10, 2025	Video Understanding	—Unverified
Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network	Jun 2, 2019	General ClassificationGraph Neural Network	—Unverified
4D Generic Video Object Proposals	Jan 26, 2019	Instance SegmentationObject	CodeCode Available
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models	Aug 26, 2024	Large Language ModelVideo Quality Assessment	CodeCode Available
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available
A Context-Aware Loss Function for Action Spotting in Soccer Videos	Dec 3, 2019	Action SpottingVideo Understanding	CodeCode Available
Learnable pooling with Context Gating for video classification	Jun 21, 2017	ClassificationClustering	CodeCode Available
Learnable Pooling Methods for Video Classification	Oct 1, 2018	ClassificationGeneral Classification	CodeCode Available
Leaping Into Memories: Space-Time Deep Feature Synthesis	Mar 17, 2023	DiversityVideo Understanding	CodeCode Available
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing	Mar 13, 2025	EgoSchemaForm	CodeCode Available
Judging a video by its bitstream cover	Sep 14, 2023	Video Understanding	CodeCode Available
CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path Prediction	May 26, 2020	Autonomous VehiclesPrediction	CodeCode Available
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model	Jul 9, 2024	Video Understanding	CodeCode Available
Joint Event Detection and Description in Continuous Video Streams	Feb 28, 2018	Dense CaptioningDense Video Captioning	CodeCode Available
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning	Apr 15, 2018	Video CaptioningVideo Understanding	CodeCode Available
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition	Jan 25, 2022	Action RecognitionOptical Flow Estimation	CodeCode Available
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens	Dec 13, 2024	Language ModelingLanguage Modelling	CodeCode Available
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Apr 14, 2024	Action RecognitionHand Pose Estimation	CodeCode Available
ViP: Video Platform for PyTorch	Oct 7, 2019	BenchmarkingVideo Understanding	CodeCode Available
ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation	May 21, 2025	Decision MakingLanguage Modeling	CodeCode Available
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	Jun 6, 2025	Video Understanding	CodeCode Available
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models	Jun 28, 2023	RetrievalVideo Retrieval	CodeCode Available
https://arxiv.org/abs/2407.00634	Jul 2, 2024	Video CaptioningVideo Description	CodeCode Available
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios	Oct 18, 2022	Video Understanding	CodeCode Available
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person Scenarios	Jun 11, 2025	Action RecognitionAction Segmentation	CodeCode Available
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations	Mar 25, 2025	Representation LearningVideo Understanding	CodeCode Available
HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding	Jan 3, 2025	Question AnsweringVideo Understanding	CodeCode Available
The Visual Centrifuge: Model-Free Layered Video Representations	Dec 4, 2018	Color Constancymodel	CodeCode Available
The YouTube-8M Kaggle Competition: Challenges and Methods	Jun 28, 2017	General ClassificationVideo Classification	CodeCode Available
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model	Jun 15, 2024	Question AnsweringVideo Understanding	CodeCode Available
The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge	Jun 16, 2017	General ClassificationVideo Classification	CodeCode Available
Hierarchical Deep Recurrent Architecture for Video Understanding	Jul 11, 2017	ClassificationGeneral Classification	CodeCode Available
Temporal Tessellation: A Unified Approach for Video Analysis	Dec 21, 2016	Action DetectionVideo Captioning	CodeCode Available
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding	May 19, 2025	Language ModelingLanguage Modelling	CodeCode Available
Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding	Jul 14, 2017	Video RecognitionVideo Understanding	CodeCode Available

Show:10 25 50

← PrevPage 21 of 23Next →

No leaderboard results yet.