SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–825 of 1149 papers

Title	Date	Tasks	Status
Vamos: Versatile Action Models for Video Understanding	Nov 22, 2023	EgoSchemaHard Attention	CodeCode Available
SPOT! Revisiting Video-Language Models for Event Understanding	Nov 21, 2023	AttributeVideo Understanding	—Unverified
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab	Nov 1, 2023	Action RecognitionVideo Understanding	—Unverified
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection	Nov 1, 2023	Action DetectionClassification	—Unverified
Beyond still images: Temporal features and input variance resilience	Nov 1, 2023	Video Understanding	—Unverified
Videoprompter: an ensemble of foundational models for zero-shot video understanding	Oct 23, 2023	Action RecognitionDescriptive	—Unverified
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding	Oct 19, 2023	RelationVideo Understanding	—Unverified
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks	Oct 7, 2023	Action RecognitionMultiple-choice	—Unverified
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model	Oct 2, 2023	Autonomous DrivingLanguage Modeling	—Unverified
Telling Stories for Common Sense Zero-Shot Action Recognition	Sep 29, 2023	Action RecognitionArticles	CodeCode Available
M^33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding	Sep 26, 2023	2D Semantic SegmentationAction Detection	—Unverified
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges	Sep 25, 2023	Anomaly DetectionDense Video Captioning	—Unverified
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding	Sep 20, 2023	Action LocalizationForm	—Unverified
Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer	Sep 19, 2023	AnatomyComputational Efficiency	—Unverified
Language as the Medium: Multimodal Video Classification through text only	Sep 19, 2023	Action RecognitionVideo Classification	—Unverified
Judging a video by its bitstream cover	Sep 14, 2023	Video Understanding	CodeCode Available
Motion-Guided Masking for Spatiotemporal Representation Learning	Aug 24, 2023	Domain AdaptationRepresentation Learning	—Unverified
MOFO: MOtion FOcused Self-Supervision for Video Understanding	Aug 23, 2023	Action ClassificationAction Recognition	CodeCode Available
Are current long-term video understanding datasets long-term?	Aug 22, 2023	Action RecognitionVideo Understanding	CodeCode Available
Audio-Visual Glance Network for Efficient Video Recognition	Aug 18, 2023	Video RecognitionVideo Understanding	—Unverified
Temporally-Adaptive Models for Efficient Video Understanding	Aug 10, 2023	Action ClassificationAction Recognition	—Unverified
M^3Net: Multi-view Encoding, Matching, and Fusion for Few-shot Fine-grained Action Recognition	Aug 6, 2023	Action RecognitionDecision Making	—Unverified
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation	Jul 31, 2023	Action SegmentationHuman-Object Interaction Detection	—Unverified
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	Jul 13, 2023	Action RecognitionContrastive Learning	—Unverified
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding	Jul 9, 2023	Action RecognitionAction Segmentation	CodeCode Available

Show:10 25 50

← PrevPage 33 of 46Next →

No leaderboard results yet.