Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1025 of 1149 papers

Title	Date	Tasks	Status
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning	Aug 5, 2021	AttributeCaption Generation	—Unverified
OBJECT DYNAMICS DISTILLATION FOR SCENE DECOMPOSITION AND REPRESENTATION	Sep 29, 2021	ObjectPredict Future Video Frames	—Unverified
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge	Nov 15, 2021	Instance SegmentationObject Recognition	—Unverified
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding	Jul 6, 2024	Video Understanding	—Unverified
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts	Mar 29, 2025	Streaming video understandingVideo Understanding	—Unverified
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Jan 14, 2025	Language ModelingLanguage Modelling	—Unverified
OmniTrack: Real-time detection and tracking of objects, text and logos in video	Oct 14, 2019	GPUobject-detection	—Unverified
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Apr 15, 2025	Semantic SegmentationVideo Generation	—Unverified
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling	Jul 19, 2019	BenchmarkingMotion Estimation	—Unverified
On the Limitations of Vision-Language Models in Understanding Image Transforms	Mar 12, 2025	Question AnsweringVideo Generation	—Unverified
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting	Apr 26, 2024	Facial Expression RecognitionMulti-Task Learning	—Unverified
Open Vocabulary Multi-Label Video Classification	Jul 12, 2024	Action ClassificationClassification	—Unverified
Open-Vocabulary Spatio-Temporal Action Detection	May 17, 2024	Action DetectionFine-Grained Action Detection	—Unverified
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Feb 13, 2025	ClassificationPrompt Engineering	—Unverified
Overview of Tencent Multi-modal Ads Video Understanding Challenge	Sep 16, 2021	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	—Unverified
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering	May 1, 2022	Question AnsweringVideo Classification	—Unverified
Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track	Dec 15, 2024	Image CaptioningMedical Question Answering	—Unverified
OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language Models	Dec 31, 2024	Activity RecognitionHuman Interaction Recognition	—Unverified
OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning	Apr 4, 2024	DescriptiveDiversity	—Unverified
PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization	Mar 9, 2021	Action LocalizationBoundary Detection	—Unverified
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries	Dec 26, 2024	Question AnsweringVideo Question Answering	—Unverified
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark	Nov 29, 2024	BenchmarkingGrounded Video Question Answering	—Unverified
Personalized Video Summarization by Multimodal Video Understanding	Nov 5, 2024	Unsupervised Video SummarizationVideo Summarization	—Unverified
Person Count Localization in Videos From Noisy Foreground and Detections	Jun 1, 2015	Foreground SegmentationHuman Detection	—Unverified

Show:10 25 50

← PrevPage 41 of 46Next →

No leaderboard results yet.