SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 776–800 of 1149 papers

Title	Date	Tasks	Status
Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model	Jan 29, 2024	Action DetectionAction Localization	—Unverified
Exploring Missing Modality in Multimodal Egocentric Datasets	Jan 21, 2024	Action RecognitionVideo Understanding	—Unverified
Learning to Visually Connect Actions and their Effects	Jan 19, 2024	Object TrackingTask Planning	—Unverified
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding	Jan 17, 2024	Contrastive Learningpoint cloud video understanding	—Unverified
Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization	Jan 16, 2024	DecoderDenoising	—Unverified
Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Jan 8, 2024	object-detectionObject Detection	CodeCode Available
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Jan 1, 2024	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Jan 1, 2024	object-detectionObject Detection	—Unverified
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action	Jan 1, 2024	Image GenerationInstruction Following	—Unverified
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning	Jan 1, 2024	Transfer LearningVideo Understanding	—Unverified
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding	Dec 31, 2023	Spatio-Temporal Video GroundingVideo Grounding	—Unverified
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision	Dec 20, 2023	Action ClassificationAttribute	—Unverified
Text-Conditioned Resampler For Long Form Video Understanding	Dec 19, 2023	EgoSchemaForm	—Unverified
Learning Object State Changes in Videos: An Open-World Perspective	Dec 19, 2023	Video Understanding	—Unverified
Artificial intelligence optical hardware empowers high-resolution hyperspectral video understanding at 1.2 Tb/s	Dec 17, 2023	Semantic SegmentationVideo Semantic Segmentation	—Unverified
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer	Dec 12, 2023	Action RecognitionAction Segmentation	CodeCode Available
Audio-Visual LLM for Video Understanding	Dec 11, 2023	AudioCapsLanguage Modeling	—Unverified
MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding	Dec 8, 2023	FormQuestion Answering	—Unverified
Retrieval-based Video Language Model for Efficient Long Video Question Answering	Dec 8, 2023	Language ModelingLanguage Modelling	—Unverified
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding	Dec 5, 2023	DiversityGraph Generation	—Unverified
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding	Dec 4, 2023	Language ModelingLanguage Modelling	—Unverified
Zero-Shot Video Question Answering with Procedural Programs	Dec 1, 2023	Code GenerationLanguage Modeling	—Unverified
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding	Nov 30, 2023	FormVideo Retrieval	—Unverified
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation	Nov 30, 2023	Contrastive LearningDomain Adaptation	—Unverified
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation	Nov 25, 2023	Instruction FollowingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 32 of 46Next →

No leaderboard results yet.