SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 901925 of 1149 papers

TitleStatusHype
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens0
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey0
Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding0
i-Code: An Integrative and Composable Multimodal Learning Framework0
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering0
Contrastive Language-Action Pre-training for Temporal Localization0
Causal Reasoning Meets Visual Representation Learning: A Prospective Study0
Revealing Occlusions with 4D Neural Fields0
Less than Few: Self-Shot Video Instance Segmentation0
ActAR: Actor-Driven Pose Embeddings for Video Action Recognition0
Adversarial Machine Learning Attacks Against Video Anomaly Detection Systems0
MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization0
PYSKL: a toolbox for skeleton-based video understanding0
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding TasksCode0
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
Human Gaze Guided Attention for Surgical Activity Recognition0
Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding0
Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection0
Concept Graph Neural Networks for Surgical Video Understanding0
Audio Visual Scene-Aware Dialog Generation with Transformer-based Video Representations0
A Coding Framework and Benchmark towards Low-Bitrate Video UnderstandingCode0
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action RecognitionCode0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Multiview Transformers for Video Recognition0
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound0
Show:102550
← PrevPage 37 of 46Next →

No leaderboard results yet.