SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 151160 of 1149 papers

TitleStatusHype
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and MitigationCode1
PAVE: Patching and Adapting Video Large Language ModelsCode1
ACVUBench: Audio-Centric Video Understanding BenchmarkCode0
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding0
CRCL: Causal Representation Consistency Learning for Anomaly Detection in Surveillance Videos0
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding0
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks0
Breaking the Encoder Barrier for Seamless Video-Language Understanding0
MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss AlpsCode1
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object UnderstandingCode0
Show:102550
← PrevPage 16 of 115Next →

No leaderboard results yet.