Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 851–900 of 1149 papers

Title	Date	Tasks	Status
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition	Jan 8, 2023	Action RecognitionFacial Expression Recognition (FER)	—Unverified
EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding	Jan 5, 2023	Video Understanding	—Unverified
PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval	Jan 1, 2023	Representation LearningRetrieval	—Unverified
Self-Supervised Object Detection from Egocentric Videos	Jan 1, 2023	Class-agnostic Object DetectionObject	—Unverified
Relational Space-Time Query in Long-Form Videos	Jan 1, 2023	FormVideo Understanding	—Unverified
Few-Shot Referring Relationships in Videos	Jan 1, 2023	ObjectRelation Network	CodeCode Available
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding	Jan 1, 2023	Video Understanding	—Unverified
Inverse Compositional Learning for Weakly-supervised Relation Grounding	Jan 1, 2023	RelationVideo Understanding	—Unverified
Multimodal High-order Relation Transformer for Scene Boundary Detection	Jan 1, 2023	Boundary DetectionDecoder	—Unverified
Joint Engagement Classification using Video Augmentation Techniques for Multi-person Human-robot Interaction	Dec 28, 2022	Data AugmentationFace Swapping	—Unverified
Inductive Attention for Video Action Anticipation	Dec 17, 2022	Action AnticipationAction Recognition	—Unverified
Egocentric Video Task Translation	Dec 13, 2022	Multi-Task LearningTranslation	—Unverified
Contextual Explainable Video Representation: Human Perception-based Understanding	Dec 12, 2022	Action DetectionAction Recognition	CodeCode Available
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data	Dec 8, 2022	Action RecognitionPrompt Learning	—Unverified
Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images	Dec 7, 2022	Building change detection for remote sensing imagesChange Detection	—Unverified
Spatio-Temporal Crop Aggregation for Video Representation Learning	Nov 30, 2022	Action ClassificationDimensionality Reduction	—Unverified
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training	Nov 23, 2022	Action RecognitionTemporal Action Localization	—Unverified
A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset	Nov 19, 2022	Common Sense ReasoningGraph Embedding	—Unverified
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022	Nov 18, 2022	Object State Change ClassificationTemporal Localization	CodeCode Available
Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022	Nov 16, 2022	Human-Object Interaction DetectionObject	—Unverified
Grounded Video Situation Recognition	Oct 19, 2022	DescriptiveStructured Prediction	—Unverified
How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios	Oct 18, 2022	Video Understanding	CodeCode Available
Self-supervised video pretraining yields robust and more human-aligned visual representations	Oct 12, 2022	Contrastive Learningobject-detection	—Unverified
Students taught by multimodal teachers are superior action recognizers	Oct 9, 2022	Action RecognitionKnowledge Distillation	—Unverified
Compressed Vision for Efficient Video Understanding	Oct 6, 2022	Video CompressionVideo Understanding	—Unverified
Learning to Focus on the Foreground for Temporal Sentence Grounding	Oct 1, 2022	SentenceTemporal Sentence Grounding	—Unverified
In-the-Wild Video Question Answering	Oct 1, 2022	Evidence SelectionQuestion Answering	—Unverified
Speeding Up Action Recognition Using Dynamic Accumulation of Residuals in Compressed Domain	Sep 29, 2022	Action RecognitionVideo Understanding	—Unverified
AVT: Audio-Video Transformer for Multimodal Action Recognition	Sep 22, 2022	Action RecognitionAudio Classification	—Unverified
WildQA: In-the-Wild Video Question Answering	Sep 14, 2022	Evidence SelectionQuestion Answering	—Unverified
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions	Sep 7, 2022	Image GenerationText to Image Generation	—Unverified
Visual Subtitle Feature Enhanced Video Outline Generation	Aug 24, 2022	ArticlesHeadline Generation	—Unverified
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding	Aug 22, 2022	Action RecognitionMulti-Task Learning	—Unverified
Motion Sensitive Contrastive Learning for Self-supervised Video Representation	Aug 12, 2022	Contrastive LearningRepresentation Learning	—Unverified
Exploring Anchor-based Detection for Ego4D Natural Language Query	Aug 10, 2022	Video Understanding	—Unverified
SA-NET.v2: Real-time vehicle detection from oblique UAV images with use of uncertainty estimation in deep meta-learning	Aug 4, 2022	Meta-LearningSemantic Segmentation	—Unverified
Two-Stream Transformer Architecture for Long Video Understanding	Aug 2, 2022	Action RecognitionGPU	—Unverified
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation	Aug 1, 2022	ObjectOptical Flow Estimation	—Unverified
EgoEnv: Human-centric environment representations from egocentric video	Jul 22, 2022	Video Understanding	—Unverified
Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022	Jul 22, 2022	ObjectObject State Change Classification	—Unverified
AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding	Jul 21, 2022	Action RecognitionVideo Understanding	—Unverified
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection	Jul 21, 2022	Action DetectionVideo Understanding	—Unverified
SVGraph: Learning Semantic Graphs from Instructional Videos	Jul 16, 2022	Graph LearningVideo Understanding	—Unverified
GraphVid: It Only Takes a Few Nodes to Understand a Video	Jul 4, 2022	SuperpixelsVideo Understanding	—Unverified
Multimodal Intent Discovery from Livestream Videos	Jul 1, 2022	Intent DiscoveryVideo Summarization	—Unverified
(Un)likelihood Training for Interpretable Embedding	Jul 1, 2022	Ad-hoc video searchDecoder	CodeCode Available
Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering	Jul 1, 2022	Question AnsweringVideo Question Answering	—Unverified
Submission to Generic Event Boundary Detection Challenge@CVPR 2022: Local Context Modeling and Global Boundary Decoding Approach	Jun 30, 2022	Boundary DetectionGeneric Event Boundary Detection	CodeCode Available
Technical Report for CVPR 2022 LOVEU AQTC Challenge	Jun 29, 2022	Video Understanding	CodeCode Available
Multimodal Dialogue State Tracking	Jun 16, 2022	Dialogue State TrackingVideo Understanding	CodeCode Available

Show:10 25 50

← PrevPage 18 of 23Next →

No leaderboard results yet.