Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 1149 papers

Title	Date	Tasks	Status	Hype
Slot State Space Models	Jun 18, 2024	MambaState Space Models	CodeCode Available	1
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning	Jun 17, 2024	Anomaly DetectionLogical Reasoning	CodeCode Available	1
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos	Jun 12, 2024	counterfactualFuture prediction	CodeCode Available	1
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos	Jun 3, 2024	Mistake DetectionOnline Mistake Detection	CodeCode Available	1
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos	May 30, 2024	Action RecognitionSurgical phase recognition	CodeCode Available	1
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment	May 22, 2024	EgoSchemaVideo Understanding	CodeCode Available	1
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding	May 14, 2024	Action DetectionGPU	CodeCode Available	1
SFMViT: SlowFast Meet ViT in Chaotic World	Apr 25, 2024	Action LocalizationVideo Understanding	CodeCode Available	1
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection	Apr 14, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis	Apr 12, 2024	Dense Video CaptioningTransfer Learning	CodeCode Available	1
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos	Apr 6, 2024	Graph GenerationRelation	CodeCode Available	1
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation	Mar 18, 2024	Referring Video Object SegmentationSemantic Segmentation	CodeCode Available	1
Towards Neuro-Symbolic Video Understanding	Mar 16, 2024	Video Understanding	CodeCode Available	1
Spatio-temporal Prompting Network for Robust Video Feature Extraction	Feb 4, 2024	Instance Segmentationobject-detection	CodeCode Available	1
BehAVE: Behaviour Alignment of Video Game Encodings	Feb 2, 2024	DiversityFPS Games	CodeCode Available	1
Compositional Video Understanding with Spatiotemporal Structure-based Transformers	Jan 1, 2024	Video Understanding	CodeCode Available	1
A Simple LLM Framework for Long-Range Video Question-Answering	Dec 28, 2023	EgoSchemaLanguage Modelling	CodeCode Available	1
Open-Vocabulary Video Relation Extraction	Dec 25, 2023	Action ClassificationAction Understanding	CodeCode Available	1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos	Dec 16, 2023	Video Captioningvideo narration captioning	CodeCode Available	1
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models	Dec 15, 2023	Video Understanding	CodeCode Available	1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Dec 12, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1
Grounded Question-Answering in Long Egocentric Videos	Dec 11, 2023	Video GroundingVideo Question Answering	CodeCode Available	1
Action Scene Graphs for Long-Form Understanding of Egocentric Videos	Dec 6, 2023	Action AnticipationForm	CodeCode Available	1
DEVIAS: Learning Disentangled Video Representations of Action and Scene	Nov 30, 2023	Action RecognitionDecoder	CodeCode Available	1

Show:10 25 50

← PrevPage 10 of 46Next →

No leaderboard results yet.