Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 726–750 of 1149 papers

Title	Date	Tasks	Status
Contrastive Language Video Time Pre-training	Jun 4, 2024	Action RecognitionContrastive Learning	—Unverified
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 1, 2024	Autonomous DrivingPanoptic Segmentation	—Unverified
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model	Jun 1, 2024	Action RecognitionActivity Recognition	—Unverified
Temporal Grounding of Activities using Multimodal Large Language Models	May 30, 2024	Video Understanding	—Unverified
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning	May 28, 2024	Decision MakingVideo Understanding	—Unverified
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions	May 28, 2024	Action RecognitionVideo Recognition	—Unverified
Streaming Long Video Understanding with Large Language Models	May 25, 2024	Question AnsweringVideo Understanding	—Unverified
MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	May 23, 2024	Action RecognitionAction Segmentation	—Unverified
Anticipating Object State Changes in Long Procedural Videos	May 21, 2024	ObjectObject State Change Classification	—Unverified
Open-Vocabulary Spatio-Temporal Action Detection	May 17, 2024	Action DetectionFine-Grained Action Detection	—Unverified
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis	May 14, 2024	4kGPU	—Unverified
CinePile: A Long Video Question Answering Dataset and Benchmark	May 14, 2024	FormHuman-Object Interaction Detection	—Unverified
Global Motion Understanding in Large-Scale Video Object Segmentation	May 11, 2024	Instance SegmentationOptical Flow Estimation	—Unverified
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning	May 11, 2024	Image-text matchingRetrieval	—Unverified
A Survey on Backbones for Deep Video Action Recognition	May 9, 2024	Action RecognitionDiversity	—Unverified
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition	May 7, 2024	Large Language ModelMultimodal Large Language Model	—Unverified
Snippet-Aware Transformer With Multiple Action Elements for Skeleton-Based Action Segmentation	May 6, 2024	Action SegmentationSkeleton Based Action Segmentation	CodeCode Available
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	May 6, 2024	Multiple-choiceVideo Understanding	—Unverified
How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	May 6, 2024	Autonomous VehiclesVideo Understanding	—Unverified
Learning text-to-video retrieval from image captioning	Apr 26, 2024	Image CaptioningImage Retrieval	—Unverified
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting	Apr 26, 2024	Facial Expression RecognitionMulti-Task Learning	—Unverified
IPAD: Industrial Process Anomaly Detection Dataset	Apr 23, 2024	Anomaly DetectionVideo Anomaly Detection	—Unverified
From Image to Video, what do we need in multimodal LLMs?	Apr 18, 2024	Video Understanding	—Unverified
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Apr 14, 2024	Action RecognitionHand Pose Estimation	CodeCode Available
A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos	Apr 10, 2024	Activity RecognitionGaze Prediction	—Unverified

Show:10 25 50

← PrevPage 30 of 46Next →

No leaderboard results yet.