SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 826–850 of 1149 papers

Title	Date	Tasks	Status
LLaVA-MLB: Mitigating and Leveraging Attention Bias for Training-Free Video LLMs	Mar 14, 2025	Video Understanding	—Unverified
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding	Jan 9, 2025	Language ModelingLanguage Modelling	—Unverified
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified
LLM4Brain: Training a Large Language Model for Brain Video Understanding	Sep 26, 2024	Domain AdaptationLanguage Modeling	—Unverified
LLMs Meet Long Video: Advancing Long Video Question Answering with An Interactive Visual Adapter in LLMs	Feb 21, 2024	Question AnsweringVideo Question Answering	—Unverified
Localizing Events in Videos with Multimodal Queries	Jun 14, 2024	Natural Language QueriesVideo Understanding	—Unverified
Localizing Unseen Activities in Video via Image Query	Jun 28, 2019	Action LocalizationVideo Understanding	—Unverified
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding	Mar 17, 2025	AttributeMME	—Unverified
Long Activity Video Understanding using Functional Object-Oriented Network	Jul 3, 2018	ObjectVideo Understanding	—Unverified
LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models	Feb 21, 2025	Caption GenerationVideo Captioning	—Unverified
Long-Short Temporal Contrastive Learning of Video Transformers	Jun 17, 2021	Action RecognitionContrastive Learning	—Unverified
LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Aug 19, 2024	Video CaptioningVideo Question Answering	—Unverified
LongViTU: Instruction Tuning for Long-Form Video Understanding	Jan 9, 2025	EgoSchemaForm	—Unverified
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory	Mar 17, 2025	FormGPU	—Unverified
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing	Nov 29, 2024	AllForm	—Unverified
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization	Mar 28, 2021	Action ClassificationAction Localization	—Unverified
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents	Mar 13, 2025	Computational EfficiencyOptical Character Recognition (OCR)	—Unverified
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Feb 4, 2025	GPUVideo Understanding	—Unverified
M^33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding	Sep 26, 2023	2D Semantic SegmentationAction Detection	—Unverified
M^3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition	Aug 6, 2023	Action RecognitionDecision Making	—Unverified
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection	May 29, 2025	image-classificationImage Classification	—Unverified
Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling	Oct 19, 2024	Video Understanding	—Unverified
MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	May 23, 2024	Action RecognitionAction Segmentation	—Unverified
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models	Jun 16, 2025	Video Understanding	—Unverified
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations	Mar 20, 2025	HallucinationVideo Understanding	—Unverified

Show:10 25 50

← PrevPage 34 of 46Next →

No leaderboard results yet.