SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 331340 of 1149 papers

TitleStatusHype
End-to-end Temporal Action Detection with TransformerCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce LearningCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
Spherical Vision Transformer for 360-degree Video Saliency PredictionCode1
Stand-Alone Inter-Frame Attention in Video ModelsCode1
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
Show:102550
← PrevPage 34 of 115Next →

No leaderboard results yet.