SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 926–950 of 1149 papers

Title	Date	Tasks	Status
Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition	Jun 14, 2017	Action RecognitionOptical Flow Estimation	—Unverified
Less than Few: Self-Shot Video Instance Segmentation	Apr 19, 2022	Few-Shot LearningInstance Segmentation	—Unverified
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition	May 21, 2025	Action RecognitionGraph Attention	—Unverified
Leveraging Local Temporal Information for Multimodal Scene Classification	Oct 26, 2021	ClassificationScene Classification	—Unverified
LIGAR: Lightweight General-purpose Action Recognition	Aug 30, 2021	Action RecognitionGesture Recognition	—Unverified
LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval	May 21, 2025	Autonomous DrivingQuestion Answering	—Unverified
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering	Nov 29, 2021	DiversityQuestion Answering	—Unverified
LLaVA-MLB: Mitigating and Leveraging Attention Bias for Training-Free Video LLMs	Mar 14, 2025	Video Understanding	—Unverified
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding	Jan 9, 2025	Language ModelingLanguage Modelling	—Unverified
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified
LLM4Brain: Training a Large Language Model for Brain Video Understanding	Sep 26, 2024	Domain AdaptationLanguage Modeling	—Unverified
LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs	Feb 21, 2024	Question AnsweringVideo Question Answering	—Unverified
Localizing Events in Videos with Multimodal Queries	Jun 14, 2024	Natural Language QueriesVideo Understanding	—Unverified
Localizing Unseen Activities in Video via Image Query	Jun 28, 2019	Action LocalizationVideo Understanding	—Unverified
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding	Mar 17, 2025	AttributeMME	—Unverified
Long Activity Video Understanding using Functional Object-Oriented Network	Jul 3, 2018	ObjectVideo Understanding	—Unverified
LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models	Feb 21, 2025	Caption GenerationVideo Captioning	—Unverified
Long-Short Temporal Contrastive Learning of Video Transformers	Jun 17, 2021	Action RecognitionContrastive Learning	—Unverified
LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Aug 19, 2024	Video CaptioningVideo Question Answering	—Unverified
LongViTU: Instruction Tuning for Long-Form Video Understanding	Jan 9, 2025	EgoSchemaForm	—Unverified
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory	Mar 17, 2025	FormGPU	—Unverified
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing	Nov 29, 2024	AllForm	—Unverified
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization	Mar 28, 2021	Action ClassificationAction Localization	—Unverified
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents	Mar 13, 2025	Computational EfficiencyOptical Character Recognition (OCR)	—Unverified
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Feb 4, 2025	GPUVideo Understanding	—Unverified

Show:10 25 50

← PrevPage 38 of 46Next →

No leaderboard results yet.