SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 476–500 of 1149 papers

Title	Date	Tasks	Status	Hype
Slot State Space Models	Jun 18, 2024	MambaState Space Models	CodeCode Available	1
Hallucination Mitigation Prompts Long-term Video Understanding	Jun 17, 2024	Answer GenerationHallucination	CodeCode Available	0
VideoVista: A Versatile Benchmark for Video Understanding and Reasoning	Jun 17, 2024	Anomaly DetectionLogical Reasoning	CodeCode Available	1
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment	Jun 16, 2024	Action UnderstandingBenchmarking	—Unverified	0
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model	Jun 15, 2024	Question AnsweringVideo Understanding	CodeCode Available	0
Localizing Events in Videos with Multimodal Queries	Jun 14, 2024	Natural Language QueriesVideo Understanding	—Unverified	0
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	Jun 14, 2024	Activity RecognitionMMR total	—Unverified	0
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified	0
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	Jun 13, 2024	Dense Video CaptioningMVBench	CodeCode Available	3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	Jun 12, 2024	cross-modal alignmentLanguage Modelling	CodeCode Available	3
LVBench: An Extreme Long Video Understanding Benchmark	Jun 12, 2024	Decision MakingVideo Understanding	CodeCode Available	2
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos	Jun 12, 2024	counterfactualFuture prediction	CodeCode Available	1
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models	Jun 12, 2024	Video Understanding	—Unverified	0
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD	Jun 11, 2024	Video RecognitionVideo Understanding	—Unverified	0
Vript: A Video Is Worth Thousands of Words	Jun 10, 2024	Video CaptioningVideo Understanding	CodeCode Available	2
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Jun 8, 2024	BenchmarkingInstance Segmentation	—Unverified	0
Semantic Segmentation on VSPW Dataset through Masked Video Consistency	Jun 7, 2024	Semantic SegmentationVideo Understanding	—Unverified	0
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions	Jun 6, 2024	Video CaptioningVideo Generation	CodeCode Available	5
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 6, 2024	Panoptic SegmentationSegmentation	—Unverified	0
MLVU: Benchmarking Multi-task Long Video Understanding	Jun 6, 2024	BenchmarkingVideo Understanding	CodeCode Available	3
Contrastive Language Video Time Pre-training	Jun 4, 2024	Action RecognitionContrastive Learning	—Unverified	0
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos	Jun 3, 2024	Mistake DetectionOnline Mistake Detection	CodeCode Available	1
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model	Jun 1, 2024	Action RecognitionActivity Recognition	—Unverified	0
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 1, 2024	Autonomous DrivingPanoptic Segmentation	—Unverified	0

Show:10 25 50

← PrevPage 20 of 46Next →

No leaderboard results yet.