SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 10011025 of 1149 papers

TitleStatusHype
HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do0
Towards Visually Explaining Video Understanding Networks with PerturbationCode1
Beyond Instructional Videos: Probing for More Diverse Visual-Textual Grounding on YouTubeCode0
DriftNet: Aggressive Driving Behavior Classification using 3D EfficientNet ArchitectureCode0
Knowledge-Based Visual Question Answering in Videos0
Real-Time Segmentation Networks should be Latency Aware0
Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries0
Fully Automated Hand Hygiene Monitoring\ Operating Room using 3D Convolutional Neural Network0
Beyond the Camera: Neural Networks in World Coordinates0
Top-1 Solution of Multi-Moments in Time Challenge 2019Code1
Video2Commonsense: Generating Commonsense Descriptions to Enrich Video CaptioningCode1
CTM: Collaborative Temporal Modeling for Action Recognition0
Weakly Supervised Temporal Action Localization Using Deep Metric LearningCode1
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in VideoCode1
Cut-Based Graph Learning Networks to Discover Compositional Structure of Sequential Video Data0
Temporal Interlacing NetworkCode1
EEV: A Large-Scale Dataset for Studying Evoked Expressions from VideoCode1
SoccerDB: A Large-Scale Database for Comprehensive Video UnderstandingCode0
Video action detection by learning graph-based spatio-temporal interactionsCode0
VideoDG: Generalizing Temporal Relations in Videos to Novel DomainsCode0
Context R-CNN: Long Term Temporal Context for Per-Camera Object DetectionCode0
A Context-Aware Loss Function for Action Spotting in Soccer VideosCode0
BERT for Large-scale Video Segment Classification with Test-time Augmentation0
A Multigrid Method for Efficiently Training Video ModelsCode1
AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization0
Show:102550
← PrevPage 41 of 46Next →

No leaderboard results yet.