Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 1149 papers

Title	Date	Tasks	Status	Hype
Judging a video by its bitstream cover	Sep 14, 2023	Video Understanding	CodeCode Available	0
SoccerNet 2023 Challenges Results	Sep 12, 2023	Action SpottingCamera Calibration	CodeCode Available	1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction	Aug 29, 2023	Federated Learningimage-classification	CodeCode Available	1
Spherical Vision Transformer for 360-degree Video Saliency Prediction	Aug 24, 2023	PredictionSaliency Prediction	CodeCode Available	1
Motion-Guided Masking for Spatiotemporal Representation Learning	Aug 24, 2023	Domain AdaptationRepresentation Learning	—Unverified	0
MOFO: MOtion FOcused Self-Supervision for Video Understanding	Aug 23, 2023	Action ClassificationAction Recognition	CodeCode Available	0
Are current long-term video understanding datasets long-term?	Aug 22, 2023	Action RecognitionVideo Understanding	CodeCode Available	0
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos	Aug 18, 2023	point cloud video understandingSelf-Supervised Learning	CodeCode Available	1
Audio-Visual Glance Network for Efficient Video Recognition	Aug 18, 2023	Video RecognitionVideo Understanding	—Unverified	0
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model	Aug 15, 2023	DecoderObject	CodeCode Available	1
Temporally-Adaptive Models for Efficient Video Understanding	Aug 10, 2023	Action ClassificationAction Recognition	—Unverified	0
M^3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition	Aug 6, 2023	Action RecognitionDecision Making	—Unverified	0
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding	Jul 31, 2023	Multiple-choiceQuestion Answering	CodeCode Available	2
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation	Jul 31, 2023	Action SegmentationHuman-Object Interaction Detection	—Unverified	0
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future	Jul 18, 2023	Knowledge Distillationobject-detection	CodeCode Available	2
Multimodal Distillation for Egocentric Action Recognition	Jul 14, 2023	Action RecognitionKnowledge Distillation	CodeCode Available	1
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	Jul 13, 2023	Action RecognitionContrastive Learning	—Unverified	0
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding	Jul 9, 2023	Action RecognitionAction Segmentation	CodeCode Available	0
Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models	Jul 9, 2023	Question AnsweringTGIF-Frame	CodeCode Available	1
VideoGLUE: Video General Understanding Evaluation of Foundation Models	Jul 6, 2023	Action RecognitionTemporal Localization	—Unverified	0
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models	Jun 28, 2023	RetrievalVideo Retrieval	CodeCode Available	0
Temporal Action Proposal Generation With Action Frequency Adaptive Network	Jun 23, 2023	Knowledge DistillationTemporal Action Proposal Generation	CodeCode Available	0
An overview on the evaluated video retrieval tasks at TRECVID 2022	Jun 22, 2023	Ad-hoc video searchRetrieval	CodeCode Available	1
Multi-Granularity Hand Action Detection	Jun 19, 2023	Action DetectionAction Localization	CodeCode Available	1
Learning Space-Time Semantic Correspondences	Jun 16, 2023	Imitation LearningSemantic correspondence	—Unverified	0
EPIC Fields: Marrying 3D Geometry and Video Understanding	Jun 14, 2023	3D geometryNeural Rendering	CodeCode Available	1
Valley: Video Assistant with Large Language model Enhanced abilitY	Jun 12, 2023	Action RecognitionInstruction Following	CodeCode Available	2
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	Jun 8, 2023	Question AnsweringVCGBench-Diverse	CodeCode Available	3
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment	Jun 8, 2023	Video Understanding	—Unverified	0
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks	Jun 7, 2023	Cross-Modal RetrievalLanguage Modelling	CodeCode Available	2
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding	Jun 5, 2023	Language ModelingLanguage Modelling	CodeCode Available	4
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning	Jun 4, 2023	BenchmarkingContrastive Learning	—Unverified	0
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning	Jun 1, 2023	Incremental LearningKnowledge Distillation	CodeCode Available	0
Action Sensitivity Learning for Temporal Action Localization	May 25, 2023	Action LocalizationMoment Queries	—Unverified	0
VideoLLM: Modeling Video Sequence with Large Language Models	May 22, 2023	DecoderVideo Understanding	CodeCode Available	1
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot	May 16, 2023	Emotion ClassificationQuestion Answering	CodeCode Available	0
Learning Higher-order Object Interactions for Keypoint-based Video Understanding	May 16, 2023	Action LocalizationAction Recognition	—Unverified	0
Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection	May 14, 2023	Image Reconstructionvehicle detection	—Unverified	0
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach	May 10, 2023	Autonomous VehiclesMonocular Visual Odometry	CodeCode Available	1
VideoChat: Chat-Centric Video Understanding	May 10, 2023	Question AnsweringVideo-based Generative Performance Benchmarking	CodeCode Available	4
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer	Apr 29, 2023	DecoderHighlight Detection	CodeCode Available	1
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System	Apr 27, 2023	Video Understanding	—Unverified	0
MRSN: Multi-Relation Support Network for Video Action Detection	Apr 24, 2023	Action DetectionRelation	—Unverified	0
Search-Map-Search: A Frame Selection Paradigm for Action Recognition	Apr 20, 2023	Action RecognitionHeuristic Search	—Unverified	0
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision	Apr 15, 2023	Language ModelingLanguage Modelling	—Unverified	0
Leveraging triplet loss for unsupervised action segmentation	Apr 13, 2023	Action SegmentationClustering	CodeCode Available	1
Therbligs in Action: Video Understanding through Motion Primitives	Apr 6, 2023	Action AnticipationAction Recognition	—Unverified	0
SVT: Supertoken Video Transformer for Efficient Video Understanding	Apr 1, 2023	Video Understanding	—Unverified	0

Show:10 25 50

← PrevPage 14 of 23Next →

No leaderboard results yet.