SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 826850 of 1149 papers

TitleStatusHype
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
Human Gaze Guided Attention for Surgical Activity Recognition0
Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding0
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection0
Domain Knowledge-Informed Self-Supervised Representations for Workout Form AssessmentCode1
Concept Graph Neural Networks for Surgical Video Understanding0
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations0
ActionFormer: Localizing Moments of Actions with TransformersCode2
Learning Optical Flow with Adaptive Graph ReasoningCode1
A Coding Framework and Benchmark towards Low-Bitrate Video UnderstandingCode0
A Dataset for Medical Instructional Video Classification and Question AnsweringCode1
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action RecognitionCode0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Multiview Transformers for Video Recognition0
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound0
Memory-Guided Semantic Learning Network for Temporal Sentence Grounding0
Recurring the Transformer for Video Action Recognition0
Improving Video Model Transfer With Dynamic Representation Learning0
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset0
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
VRDFormer: End-to-End Video Visual Relation Detection With Transformers0
Video Joint Modelling Based on Hierarchical Transformer for Co-summarizationCode1
Exploiting Long-Term Dependencies for Generating Dynamic Scene GraphsCode0
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Discrete neural representations for explainable anomaly detection0
Show:102550
← PrevPage 34 of 46Next →

No leaderboard results yet.