Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 1149 papers

Title	Date	Tasks	Status	Hype
CAST: Cross-Attention in Space and Time for Video Action Recognition	Nov 30, 2023	Action ClassificationAction Recognition	CodeCode Available	1
Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties	Nov 28, 2023	In-Context LearningVideo Understanding	CodeCode Available	1
Panoptic Video Scene Graph Generation	Nov 28, 2023	Graph GenerationPanoptic Scene Graph Generation	CodeCode Available	1
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning	Nov 27, 2023	Action ClassificationAction Recognition	CodeCode Available	1
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding	Nov 25, 2023	Video Understanding	CodeCode Available	1
MM-VID: Advancing Video Understanding with GPT-4V(ision)	Oct 30, 2023	Script GenerationVideo Understanding	CodeCode Available	1
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning	Sep 27, 2023	GPUVideo-based Generative Performance Benchmarking	CodeCode Available	1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning	Sep 27, 2023	Action RecognitionAction Segmentation	CodeCode Available	1
SoccerNet 2023 Challenges Results	Sep 12, 2023	Action SpottingCamera Calibration	CodeCode Available	1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction	Aug 29, 2023	Federated Learningimage-classification	CodeCode Available	1
Spherical Vision Transformer for 360-degree Video Saliency Prediction	Aug 24, 2023	PredictionSaliency Prediction	CodeCode Available	1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos	Aug 18, 2023	point cloud video understandingSelf-Supervised Learning	CodeCode Available	1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model	Aug 15, 2023	DecoderObject	CodeCode Available	1
Multimodal Distillation for Egocentric Action Recognition	Jul 14, 2023	Action RecognitionKnowledge Distillation	CodeCode Available	1
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models	Jul 9, 2023	Question AnsweringTGIF-Frame	CodeCode Available	1
An overview on the evaluated video retrieval tasks at TRECVID 2022	Jun 22, 2023	Ad-hoc video searchRetrieval	CodeCode Available	1
Multi-Granularity Hand Action Detection	Jun 19, 2023	Action DetectionAction Localization	CodeCode Available	1
EPIC Fields: Marrying 3D Geometry and Video Understanding	Jun 14, 2023	3D geometryNeural Rendering	CodeCode Available	1
VideoLLM: Modeling Video Sequence with Large Language Models	May 22, 2023	DecoderVideo Understanding	CodeCode Available	1
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach	May 10, 2023	Autonomous VehiclesMonocular Visual Odometry	CodeCode Available	1
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer	Apr 29, 2023	DecoderHighlight Detection	CodeCode Available	1
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
Leveraging triplet loss for unsupervised action segmentation	Apr 13, 2023	Action SegmentationClustering	CodeCode Available	1
Procedure-Aware Pretraining for Instructional Video Understanding	Mar 31, 2023	Video Understanding	CodeCode Available	1
Whether and When does Endoscopy Domain Pretraining Make Sense?	Mar 30, 2023	Action Triplet DetectionSurgical phase recognition	CodeCode Available	1
Streaming Video Model	Mar 30, 2023	Action RecognitionDecoder	CodeCode Available	1
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition	Mar 28, 2023	Action RecognitionOptical Flow Estimation	CodeCode Available	1
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos	Mar 22, 2023	Representation LearningSentence	CodeCode Available	1
Dual-path Adaptation from Image to Video Transformers	Mar 17, 2023	Action ClassificationAction Recognition	CodeCode Available	1
TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization	Mar 16, 2023	Action LocalizationTemporal Action Localization	CodeCode Available	1
Localizing Moments in Long Video Via Multimodal Guidance	Feb 26, 2023	Natural Language Moment RetrievalNatural Language Visual Grounding	CodeCode Available	1
Test of Time: Instilling Video-Language Models with a Sense of Time	Jan 5, 2023	Video-Text RetrievalVideo Understanding	CodeCode Available	1
Boosting Single Image Super-Resolution via Partial Channel Shifting	Jan 1, 2023	DiversityImage Super-Resolution	CodeCode Available	1
Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning	Jan 1, 2023	Contrastive LearningRepresentation Learning	CodeCode Available	1
Towards Smooth Video Composition	Dec 14, 2022	Image Generationsingle-image-generation	CodeCode Available	1
MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity Parsing	Nov 28, 2022	Activity RecognitionFew Shot Action Recognition	CodeCode Available	1
Contrastive Masked Autoencoders for Self-Supervised Video Hashing	Nov 21, 2022	DecoderRetrieval	CodeCode Available	1
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens	Nov 19, 2022	Action RecognitionObject State Change Classification	CodeCode Available	1
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges	Nov 17, 2022	Future Hand PredictionMoment Queries	CodeCode Available	1
VTC: Improving Video-Text Retrieval with User Comments	Oct 19, 2022	Representation LearningRetrieval	CodeCode Available	1
EgoTaskQA: Understanding Human Tasks in Egocentric Videos	Oct 8, 2022	Action Localizationcounterfactual	CodeCode Available	1
SoccerNet 2022 Challenges Results	Oct 5, 2022	Action SpottingCamera Calibration	CodeCode Available	1
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge	Sep 30, 2022	DescriptiveRepresentation Learning	CodeCode Available	1
Streaming Video Temporal Action Segmentation In Real Time	Sep 28, 2022	Action SegmentationLanguage Modelling	CodeCode Available	1
Panoramic Vision Transformer for Saliency Detection in 360° Videos	Sep 19, 2022	Saliency DetectionSaliency Prediction	CodeCode Available	1
EchoCoTr: Estimation of the Left Ventricular Ejection Fraction from Spatiotemporal Echocardiography	Sep 9, 2022	Video Understanding	CodeCode Available	1
DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality Annotations	Aug 17, 2022	Camera CalibrationInstance Segmentation	CodeCode Available	1
Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding	Jul 30, 2022	point cloud video understandingVideo Understanding	CodeCode Available	1
Static and Dynamic Concepts for Self-supervised Video Representation Learning	Jul 26, 2022	DiversityRepresentation Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 6 of 23Next →

No leaderboard results yet.