SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 251275 of 1149 papers

TitleStatusHype
CAST: Cross-Attention in Space and Time for Video Action RecognitionCode1
Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional PropertiesCode1
Panoptic Video Scene Graph GenerationCode1
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
Mug-STAN: Adapting Image-Language Pretrained Models for General Video UnderstandingCode1
MM-VID: Advancing Video Understanding with GPT-4V(ision)Code1
BT-Adapter: Video Conversation is Feasible Without Video Instruction TuningCode1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce LearningCode1
SoccerNet 2023 Challenges ResultsCode1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot InteractionCode1
Spherical Vision Transformer for 360-degree Video Saliency PredictionCode1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Multimodal Distillation for Egocentric Action RecognitionCode1
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text ModelsCode1
An overview on the evaluated video retrieval tasks at TRECVID 2022Code1
Multi-Granularity Hand Action DetectionCode1
EPIC Fields: Marrying 3D Geometry and Video UnderstandingCode1
VideoLLM: Modeling Video Sequence with Large Language ModelsCode1
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding ApproachCode1
MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerCode1
Event-Free Moving Object Segmentation from Moving Ego VehicleCode1
Leveraging triplet loss for unsupervised action segmentationCode1
Procedure-Aware Pretraining for Instructional Video UnderstandingCode1
Show:102550
← PrevPage 11 of 46Next →

No leaderboard results yet.