Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 1149 papers

Title	Date	Tasks	Status
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model	Jun 1, 2024	Action RecognitionActivity Recognition	—Unverified
HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition	Jan 19, 2025	Action RecognitionRelation Classification	—Unverified
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions	May 28, 2024	Action RecognitionVideo Recognition	—Unverified
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding	Jan 1, 2025	Question AnsweringVideo Understanding	—Unverified
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding	Dec 5, 2023	DiversityGraph Generation	—Unverified
Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis	May 28, 2021	Multimodal Sentiment AnalysisObject Recognition	—Unverified
HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do	May 1, 2020	Video Understanding	—Unverified
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving	Jan 8, 2025	Autonomous DrivingMamba	—Unverified
How Can Objects Help Video-Language Understanding?	Apr 10, 2025	Image CaptioningObject	—Unverified
How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos	Dec 2, 2018	Logical ReasoningQuestion Answering	—Unverified
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?	Apr 19, 2025	Video Understanding	—Unverified
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jan 25, 2025	Action UnderstandingEmotion Recognition	—Unverified
HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data	Dec 23, 2024	Action RecognitionVideo Understanding	—Unverified
HuMoCon: Concept Discovery for Human Motion Understanding	Jan 1, 2025	Video Understanding	—Unverified
i-Code: An Integrative and Composable Multimodal Learning Framework	May 3, 2022	Contrastive LearningVideo Understanding	—Unverified
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding	Aug 22, 2022	Action RecognitionMulti-Task Learning	—Unverified
Identity-aware Graph Memory Network for Action Detection	Aug 26, 2021	Action DetectionGraph Neural Network	—Unverified
iMOVE: Instance-Motion-Aware Video Understanding	Feb 17, 2025	Computational EfficiencyVideo Understanding	—Unverified
Impossible Videos	Mar 18, 2025	counterfactualVideo Generation	—Unverified
Improving LLM Video Understanding with 16 Frames Per Second	Mar 18, 2025	MMEVideo MME	—Unverified
Improving Video Model Transfer With Dynamic Representation Learning	Jan 1, 2022	Action ClassificationKnowledge Distillation	—Unverified
Inductive Attention for Video Action Anticipation	Dec 17, 2022	Action AnticipationAction Recognition	—Unverified
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding	Jun 18, 2025	GPUStreaming video understanding	—Unverified
InstructionBench: An Instructional Video Understanding Benchmark	Apr 7, 2025	Common Sense ReasoningMultiple-choice	—Unverified
Instrument-tissue Interaction Detection Framework for Surgical Video Understanding	Mar 30, 2024	Video Understanding	—Unverified
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection	Nov 27, 2018	Objectobject-detection	—Unverified
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Jul 3, 2024	ArticlesImage Comprehension	—Unverified
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Jan 21, 2025	Instruction FollowingMathematical Reasoning	—Unverified
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	Jul 13, 2023	Action RecognitionContrastive Learning	—Unverified
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Jan 21, 2025	Object TrackingReferring Expression Segmentation	—Unverified
InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model	Feb 26, 2025	Video Quality AssessmentVideo Understanding	—Unverified
Interpretable Action Recognition on Hard to Classify Actions	Sep 19, 2024	Action RecognitionDepth Estimation	—Unverified
InterRVOS: Interaction-aware Referring Video Object Segmentation	Jun 3, 2025	8kObject	—Unverified
In-the-Wild Video Question Answering	Oct 1, 2022	Evidence SelectionQuestion Answering	—Unverified
Inverse Compositional Learning for Weakly-supervised Relation Grounding	Jan 1, 2023	RelationVideo Understanding	—Unverified
IPAD: Industrial Process Anomaly Detection Dataset	Apr 23, 2024	Anomaly DetectionVideo Anomaly Detection	—Unverified
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes	Jun 26, 2025	AttributeQuestion Answering	—Unverified
IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs	Dec 13, 2024	Question AnsweringVideo Question Answering	—Unverified
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified
Joint Engagement Classification using Video Augmentation Techniques for Multi-person Human-robot Interaction	Dec 28, 2022	Data AugmentationFace Swapping	—Unverified
Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals	Jul 1, 2017	Video Understanding	—Unverified
Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input	Aug 28, 2024	Language ModelingLanguage Modelling	—Unverified
KeyVideoLLM: Towards Large-scale Video Keyframe Selection	Jul 3, 2024	Data CompressionManagement	—Unverified
Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition	Aug 12, 2017	Objectobject-detection	—Unverified
KnowIT VQA: Answering Knowledge-Based Questions about Videos	Oct 23, 2019	Question AnsweringVideo Question Answering	—Unverified
Knowledge-Based Visual Question Answering in Videos	Apr 17, 2020	Question AnsweringVideo Question Answering	—Unverified
Koala: Key frame-conditioned long video-LLM	Apr 5, 2024	Action RecognitionQuestion Answering	—Unverified
Label Denoising with Large Ensembles of Heterogeneous Neural Networks	Sep 12, 2018	Data AugmentationDenoising	—Unverified
Language as the Medium: Multimodal Video Classification through text only	Sep 19, 2023	Action RecognitionVideo Classification	—Unverified
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers	Apr 2, 2021	DiagnosticVideo Editing	—Unverified

Show:10 25 50

← PrevPage 16 of 23Next →

No leaderboard results yet.