SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 376400 of 1149 papers

TitleStatusHype
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Can An Image Classifier Suffice For Action Recognition?Code1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action RecognitionCode1
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video RetrievalCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
An Empirical Study of End-to-End Temporal Action DetectionCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Large Scale Holistic Video UnderstandingCode1
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text ModelsCode1
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal TokensCode1
SoccerNet 2022 Challenges ResultsCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerCode1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
Towards Long-Form Video UnderstandingCode1
VideoMamba: Spatio-Temporal Selective State Space ModelCode1
Show:102550
← PrevPage 16 of 46Next →

No leaderboard results yet.