SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 476500 of 1149 papers

TitleStatusHype
Slot State Space ModelsCode1
Hallucination Mitigation Prompts Long-term Video UnderstandingCode0
VideoVista: A Versatile Benchmark for Video Understanding and ReasoningCode1
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment0
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal ModelCode0
Localizing Events in Videos with Multimodal Queries0
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding0
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living0
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMsCode2
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video UnderstandingCode3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video StreamsCode3
LVBench: An Extreme Long Video Understanding BenchmarkCode2
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in VideosCode1
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models0
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD0
Vript: A Video Is Worth Thousands of WordsCode2
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation0
Semantic Segmentation on VSPW Dataset through Masked Video Consistency0
ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsCode5
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation0
MLVU: Benchmarking Multi-task Long Video UnderstandingCode3
Contrastive Language Video Time Pre-training0
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric VideosCode1
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model0
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation0
Show:102550
← PrevPage 20 of 46Next →

No leaderboard results yet.