Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 1149 papers

Title	Date	Tasks	Status	Hype
Temporal Action Segmentation: An Analysis of Modern Techniques	Oct 19, 2022	Action SegmentationSegmentation	CodeCode Available	2
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios	Oct 18, 2022	Video Understanding	CodeCode Available	0
Self-supervised video pretraining yields robust and more human-aligned visual representations	Oct 12, 2022	Contrastive Learningobject-detection	—Unverified	0
Students taught by multimodal teachers are superior action recognizers	Oct 9, 2022	Action RecognitionKnowledge Distillation	—Unverified	0
EgoTaskQA: Understanding Human Tasks in Egocentric Videos	Oct 8, 2022	Action Localizationcounterfactual	CodeCode Available	1
Compressed Vision for Efficient Video Understanding	Oct 6, 2022	Video CompressionVideo Understanding	—Unverified	0
SoccerNet 2022 Challenges Results	Oct 5, 2022	Action SpottingCamera Calibration	CodeCode Available	1
Learning to Focus on the Foreground for Temporal Sentence Grounding	Oct 1, 2022	SentenceTemporal Sentence Grounding	—Unverified	0
In-the-Wild Video Question Answering	Oct 1, 2022	Evidence SelectionQuestion Answering	—Unverified	0
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge	Sep 30, 2022	DescriptiveRepresentation Learning	CodeCode Available	1
Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain	Sep 29, 2022	Action RecognitionVideo Understanding	—Unverified	0
Streaming Video Temporal Action Segmentation In Real Time	Sep 28, 2022	Action SegmentationLanguage Modelling	CodeCode Available	1
AVT: Audio-Video Transformer for Multimodal Action Recognition	Sep 22, 2022	Action RecognitionAudio Classification	—Unverified	0
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer	Sep 22, 2022	Action ClassificationAction Recognition	CodeCode Available	2
Panoramic Vision Transformer for Saliency Detection in 360° Videos	Sep 19, 2022	Saliency DetectionSaliency Prediction	CodeCode Available	1
WildQA: In-the-Wild Video Question Answering	Sep 14, 2022	Evidence SelectionQuestion Answering	—Unverified	0
EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography	Sep 9, 2022	Video Understanding	CodeCode Available	1
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions	Sep 7, 2022	Image GenerationText to Image Generation	—Unverified	0
Visual Subtitle Feature Enhanced Video Outline Generation	Aug 24, 2022	ArticlesHeadline Generation	—Unverified	0
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding	Aug 22, 2022	Action RecognitionMulti-Task Learning	—Unverified	0
DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations	Aug 17, 2022	Camera CalibrationInstance Segmentation	CodeCode Available	1
Motion Sensitive Contrastive Learning for Self-supervised Video Representation	Aug 12, 2022	Contrastive LearningRepresentation Learning	—Unverified	0
Exploring Anchor-based Detection for Ego4D Natural Language Query	Aug 10, 2022	Video Understanding	—Unverified	0
SA-NET.v2: Real-time vehicle detection from oblique UAV images with use of uncertainty estimation in deep meta-learning	Aug 4, 2022	Meta-LearningSemantic Segmentation	—Unverified	0
Two-Stream Transformer Architecture for Long Video Understanding	Aug 2, 2022	Action RecognitionGPU	—Unverified	0
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation	Aug 1, 2022	ObjectOptical Flow Estimation	—Unverified	0
Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding	Jul 30, 2022	point cloud video understandingVideo Understanding	CodeCode Available	1
Static and Dynamic Concepts for Self-supervised Video Representation Learning	Jul 26, 2022	DiversityRepresentation Learning	CodeCode Available	1
EgoEnv: Human-centric environment representations from egocentric video	Jul 22, 2022	Video Understanding	—Unverified	0
Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022	Jul 22, 2022	ObjectObject State Change Classification	—Unverified	0
AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding	Jul 21, 2022	Action RecognitionVideo Understanding	—Unverified	0
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection	Jul 21, 2022	Action DetectionVideo Understanding	—Unverified	0
Spotting Temporally Precise, Fine-Grained Events in Video	Jul 20, 2022	Action DetectionAction Spotting	CodeCode Available	1
Clover: Towards A Unified Video-Language Alignment and Fusion Model	Jul 16, 2022	Language ModelingLanguage Modelling	CodeCode Available	1
SVGraph: Learning Semantic Graphs from Instructional Videos	Jul 16, 2022	Graph LearningVideo Understanding	—Unverified	0
Is Appearance Free Action Recognition Possible?	Jul 13, 2022	Action RecognitionOptical Flow Estimation	CodeCode Available	1
Federated Self-supervised Learning for Video Understanding	Jul 5, 2022	Action RecognitionFederated Learning	CodeCode Available	1
GraphVid: It Only Takes a Few Nodes to Understand a Video	Jul 4, 2022	SuperpixelsVideo Understanding	—Unverified	0
Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering	Jul 1, 2022	Question AnsweringVideo Question Answering	—Unverified	0
Multimodal Intent Discovery from Livestream Videos	Jul 1, 2022	Intent DiscoveryVideo Summarization	—Unverified	0
(Un)likelihood Training for Interpretable Embedding	Jul 1, 2022	Ad-hoc video searchDecoder	CodeCode Available	0
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach	Jun 30, 2022	Boundary DetectionGeneric Event Boundary Detection	CodeCode Available	0
Technical Report for CVPR 2022 LOVEU AQTC Challenge	Jun 29, 2022	Video Understanding	CodeCode Available	0
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning	Jun 27, 2022	Action ClassificationAction Recognition	CodeCode Available	1
REVECA -- Rich Encoder-decoder framework for Video Event CAptioner	Jun 18, 2022	DecoderSemantic Segmentation	CodeCode Available	1
Multimodal Dialogue State Tracking	Jun 16, 2022	Dialogue State TrackingVideo Understanding	CodeCode Available	0
Stand-Alone Inter-Frame Attention in Video Models	Jun 14, 2022	Action ClassificationAction Recognition	CodeCode Available	1
Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens	Jun 13, 2022	Action RecognitionVideo Understanding	—Unverified	0
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector	Jun 7, 2022	Action ClassificationAction Detection	CodeCode Available	1
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey	Jun 5, 2022	3D Hand Pose EstimationDomain Adaptation	—Unverified	0

Show:10 25 50

← PrevPage 16 of 23Next →

No leaderboard results yet.