Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–275 of 1149 papers

Title	Date	Tasks	Status	Hype
CAST: Cross-Attention in Space and Time for Video Action Recognition	Nov 30, 2023	Action ClassificationAction Recognition	CodeCode Available	1
Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties	Nov 28, 2023	In-Context LearningVideo Understanding	CodeCode Available	1
Panoptic Video Scene Graph Generation	Nov 28, 2023	Graph GenerationPanoptic Scene Graph Generation	CodeCode Available	1
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning	Nov 27, 2023	Action ClassificationAction Recognition	CodeCode Available	1
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding	Nov 25, 2023	Video Understanding	CodeCode Available	1
MM-VID: Advancing Video Understanding with GPT-4V(ision)	Oct 30, 2023	Script GenerationVideo Understanding	CodeCode Available	1
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning	Sep 27, 2023	GPUVideo-based Generative Performance Benchmarking	CodeCode Available	1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning	Sep 27, 2023	Action RecognitionAction Segmentation	CodeCode Available	1
SoccerNet 2023 Challenges Results	Sep 12, 2023	Action SpottingCamera Calibration	CodeCode Available	1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction	Aug 29, 2023	Federated Learningimage-classification	CodeCode Available	1
Spherical Vision Transformer for 360-degree Video Saliency Prediction	Aug 24, 2023	PredictionSaliency Prediction	CodeCode Available	1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos	Aug 18, 2023	point cloud video understandingSelf-Supervised Learning	CodeCode Available	1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model	Aug 15, 2023	DecoderObject	CodeCode Available	1
Multimodal Distillation for Egocentric Action Recognition	Jul 14, 2023	Action RecognitionKnowledge Distillation	CodeCode Available	1
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models	Jul 9, 2023	Question AnsweringTGIF-Frame	CodeCode Available	1
An overview on the evaluated video retrieval tasks at TRECVID 2022	Jun 22, 2023	Ad-hoc video searchRetrieval	CodeCode Available	1
Multi-Granularity Hand Action Detection	Jun 19, 2023	Action DetectionAction Localization	CodeCode Available	1
EPIC Fields: Marrying 3D Geometry and Video Understanding	Jun 14, 2023	3D geometryNeural Rendering	CodeCode Available	1
VideoLLM: Modeling Video Sequence with Large Language Models	May 22, 2023	DecoderVideo Understanding	CodeCode Available	1
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach	May 10, 2023	Autonomous VehiclesMonocular Visual Odometry	CodeCode Available	1
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer	Apr 29, 2023	DecoderHighlight Detection	CodeCode Available	1
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
Leveraging triplet loss for unsupervised action segmentation	Apr 13, 2023	Action SegmentationClustering	CodeCode Available	1
Procedure-Aware Pretraining for Instructional Video Understanding	Mar 31, 2023	Video Understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 11 of 46Next →

No leaderboard results yet.