SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–525 of 1149 papers

Title	Date	Tasks	Status	Hype
Temporal Grounding of Activities using Multimodal Large Language Models	May 30, 2024	Video Understanding	—Unverified	0
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark	May 30, 2024	DeepFake DetectionMamba	CodeCode Available	2
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos	May 30, 2024	Action RecognitionSurgical phase recognition	CodeCode Available	1
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos	May 29, 2024	EgoSchemaMME	CodeCode Available	2
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions	May 28, 2024	Action RecognitionVideo Recognition	—Unverified	0
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning	May 28, 2024	Decision MakingVideo Understanding	—Unverified	0
Hawk: Learning to Understand Open-World Video Anomalies	May 27, 2024	Anomaly DetectionQuestion Answering	CodeCode Available	3
Streaming Long Video Understanding with Large Language Models	May 25, 2024	Question AnsweringVideo Understanding	—Unverified	0
MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	May 23, 2024	Action RecognitionAction Segmentation	—Unverified	0
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignment	May 22, 2024	EgoSchemaVideo Understanding	CodeCode Available	1
Dense Connector for MLLMs	May 22, 2024	Video Understanding	CodeCode Available	2
Anticipating Object State Changes in Long Procedural Videos	May 21, 2024	ObjectObject State Change Classification	—Unverified	0
Open-Vocabulary Spatio-Temporal Action Detection	May 17, 2024	Action DetectionFine-Grained Action Detection	—Unverified	0
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding	May 14, 2024	Action DetectionGPU	CodeCode Available	1
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified	0
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis	May 14, 2024	4kGPU	—Unverified	0
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning	May 11, 2024	Image-text matchingRetrieval	—Unverified	0
Global Motion Understanding in Large-Scale Video Object Segmentation	May 11, 2024	Instance SegmentationOptical Flow Estimation	—Unverified	0
A Survey on Backbones for Deep Video Action Recognition	May 9, 2024	Action RecognitionDiversity	—Unverified	0
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified	0
Vision Mamba: A Comprehensive Survey and Taxonomy	May 7, 2024	MambaMedical Image Analysis	CodeCode Available	2
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation	May 6, 2024	Action SegmentationSkeleton Based Action Segmentation	CodeCode Available	0
Foundation Models for Video Understanding: A Survey	May 6, 2024	SurveyVideo Understanding	CodeCode Available	2
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	May 6, 2024	Autonomous VehiclesVideo Understanding	—Unverified	0
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	May 6, 2024	Multiple-choiceVideo Understanding	—Unverified	0

Show:10 25 50

← PrevPage 21 of 46Next →

No leaderboard results yet.