SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 351400 of 1149 papers

TitleStatusHype
End-to-End Streaming Video Temporal Action Segmentation with Reinforce LearningCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
Language-Guided Audio-Visual Learning for Long-Term Sports AssessmentCode1
Learning Self-Similarity in Space and Time as a Generalized Motion for Action RecognitionCode1
Towards Event-oriented Long Video UnderstandingCode1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMsCode1
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action RecognitionCode1
Isolated Sign Recognition from RGB Video using Pose Flow and Self-AttentionCode1
An overview on the evaluated video retrieval tasks at TRECVID 2022Code1
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?Code1
Open-Vocabulary Video Relation ExtractionCode1
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesCode1
Panoptic Video Scene Graph GenerationCode1
A Comprehensive Study of Deep Video Action RecognitionCode1
PAN: Towards Fast Action Recognition via Learning Persistence of AppearanceCode1
Elaborative Rehearsal for Zero-shot Action RecognitionCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Is Appearance Free Action Recognition Possible?Code1
Token Shift Transformer for Video ClassificationCode1
Towards High-Quality Temporal Action Detection with Sparse ProposalsCode1
-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory ConsolidationCode1
EgoTaskQA: Understanding Human Tasks in Egocentric VideosCode1
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery VideosCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Can An Image Classifier Suffice For Action Recognition?Code1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action RecognitionCode1
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video RetrievalCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
An Empirical Study of End-to-End Temporal Action DetectionCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Large Scale Holistic Video UnderstandingCode1
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text ModelsCode1
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal TokensCode1
SoccerNet 2022 Challenges ResultsCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerCode1
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
Towards Long-Form Video UnderstandingCode1
VideoMamba: Spatio-Temporal Selective State Space ModelCode1
Show:102550
← PrevPage 8 of 23Next →

No leaderboard results yet.