SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 476500 of 1149 papers

TitleStatusHype
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action RecognitionCode0
Recurrent Space-time Graph Neural NetworksCode0
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story VideosCode0
Constrained-size Tensorflow Models for YouTube-8M Video Understanding ChallengeCode0
VideoDG: Generalizing Temporal Relations in Videos to Novel DomainsCode0
Relation-aware Hierarchical Attention Framework for Video Question AnsweringCode0
SoccerDB: A Large-Scale Database for Comprehensive Video UnderstandingCode0
Pooled Motion Features for First-Person VideosCode0
A Coding Framework and Benchmark towards Low-Bitrate Video UnderstandingCode0
FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding TasksCode0
ActAlign: Zero-Shot Fine-Grained Video Classification via Language-Guided Sequence AlignmentCode0
Pairwise Emotional Relationship Recognition in Drama Videos: Dataset and BenchmarkCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video UnderstandingCode0
On the Pitfalls of Batch Normalization for End-to-End Video Learning: A Study on Surgical Workflow AnalysisCode0
Few-Shot Referring Relationships in VideosCode0
Features Understanding in 3D CNNs for Actions Recognition in VideoCode0
A Context-Aware Loss Function for Action Spotting in Soccer VideosCode0
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under OcclusionsCode0
Spatio-Temporal Perturbations for Video AttributionCode0
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video ArchitecturesCode0
NeXtVLAD: An Efficient Neural Network to Aggregate Frame-level Features for Large-scale Video ClassificationCode0
Exploring Temporal Information for Improved Video UnderstandingCode0
Multimodal Dialogue State TrackingCode0
Exploiting Long-Term Dependencies for Generating Dynamic Scene GraphsCode0
Show:102550
← PrevPage 20 of 46Next →

No leaderboard results yet.