SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 751800 of 1149 papers

TitleStatusHype
Temporal Action Segmentation: An Analysis of Modern TechniquesCode2
How Would The Viewer Feel? Estimating Wellbeing From Video ScenariosCode0
Self-supervised video pretraining yields robust and more human-aligned visual representations0
Students taught by multimodal teachers are superior action recognizers0
EgoTaskQA: Understanding Human Tasks in Egocentric VideosCode1
Compressed Vision for Efficient Video Understanding0
SoccerNet 2022 Challenges ResultsCode1
Learning to Focus on the Foreground for Temporal Sentence Grounding0
In-the-Wild Video Question Answering0
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeCode1
Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain0
Streaming Video Temporal Action Segmentation In Real TimeCode1
AVT: Audio-Video Transformer for Multimodal Action Recognition0
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormerCode2
Panoramic Vision Transformer for Saliency Detection in 360° VideosCode1
WildQA: In-the-Wild Video Question Answering0
EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal EchocardiographyCode1
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions0
Visual Subtitle Feature Enhanced Video Outline Generation0
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding0
DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality AnnotationsCode1
Motion Sensitive Contrastive Learning for Self-supervised Video Representation0
Exploring Anchor-based Detection for Ego4D Natural Language Query0
SA-NET.v2: Real-time vehicle detection from oblique UAV images with use of uncertainty estimation in deep meta-learning0
Two-Stream Transformer Architecture for Long Video Understanding0
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation0
Point Primitive Transformer for Long-Term 4D Point Cloud Video UnderstandingCode1
Static and Dynamic Concepts for Self-supervised Video Representation LearningCode1
EgoEnv: Human-centric environment representations from egocentric video0
Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 20220
AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding0
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection0
Spotting Temporally Precise, Fine-Grained Events in VideoCode1
Clover: Towards A Unified Video-Language Alignment and Fusion ModelCode1
SVGraph: Learning Semantic Graphs from Instructional Videos0
Is Appearance Free Action Recognition Possible?Code1
Federated Self-supervised Learning for Video UnderstandingCode1
GraphVid: It Only Takes a Few Nodes to Understand a Video0
Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering0
Multimodal Intent Discovery from Livestream Videos0
(Un)likelihood Training for Interpretable EmbeddingCode0
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding ApproachCode0
Technical Report for CVPR 2022 LOVEU AQTC ChallengeCode0
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningCode1
REVECA -- Rich Encoder-decoder framework for Video Event CAptionerCode1
Multimodal Dialogue State TrackingCode0
Stand-Alone Inter-Frame Attention in Video ModelsCode1
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens0
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey0
Show:102550
← PrevPage 16 of 23Next →

No leaderboard results yet.