Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–950 of 1149 papers

Title	Date	Tasks	Status
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Apr 15, 2025	Semantic SegmentationVideo Generation	—Unverified
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling	Jul 19, 2019	BenchmarkingMotion Estimation	—Unverified
On the Limitations of Vision-Language Models in Understanding Image Transforms	Mar 12, 2025	Question AnsweringVideo Generation	—Unverified
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting	Apr 26, 2024	Facial Expression RecognitionMulti-Task Learning	—Unverified
Open Vocabulary Multi-Label Video Classification	Jul 12, 2024	Action ClassificationClassification	—Unverified
Open-Vocabulary Spatio-Temporal Action Detection	May 17, 2024	Action DetectionFine-Grained Action Detection	—Unverified
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Feb 13, 2025	ClassificationPrompt Engineering	—Unverified
Overview of Tencent Multi-modal Ads Video Understanding Challenge	Sep 16, 2021	Multi-Label ClassificationMUlTI-LABEL-ClASSIFICATION	—Unverified
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering	May 1, 2022	Question AnsweringVideo Classification	—Unverified
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders	May 30, 2025	Video Understanding	—Unverified
Time Blindness: Why Video-Language Models Can't See What Humans Can?	May 30, 2025	Temporal SequencesVideo Understanding	—Unverified
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding	Apr 2, 2025	Video Understanding	—Unverified
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Apr 24, 2025	Caption GenerationDense Video Captioning	—Unverified
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs	Mar 13, 2025	BenchmarkingQuestion Answering	—Unverified
Toward a Human-Level Video Understanding Intelligence	Oct 8, 2021	AI AgentVideo Understanding	—Unverified
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder	Sep 20, 2024	Activity RecognitionDiagnostic	—Unverified
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking	Apr 11, 2025	Moment RetrievalQuestion Answering	—Unverified
Towards Fine-Grained Video Question Answering	Mar 10, 2025	Language ModelingLanguage Modelling	—Unverified
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset	Jun 19, 2024	Language ModelingLanguage Modelling	—Unverified
Towards Long Video Understanding via Fine-detailed Video Story Generation	Dec 9, 2024	Story GenerationVideo Understanding	—Unverified
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition	Mar 17, 2025	Action RecognitionVideo Recognition	—Unverified
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition	Jun 9, 2021	Action RecognitionPoint Cloud Classification	—Unverified
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection	Mar 5, 2025	Anomaly DetectionObject	—Unverified
Transformed ROIs for Capturing Visual Transformations in Videos	Jun 6, 2021	Action RecognitionVideo Understanding	—Unverified
Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images	Dec 7, 2022	Building change detection for remote sensing imagesChange Detection	—Unverified
TVBench: Redesigning Video-Language Evaluation	Oct 10, 2024	Multiple-choiceOpen-Ended Question Answering	—Unverified
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning	Feb 29, 2024	Question AnsweringVideo Understanding	—Unverified
Two Causally Related Needles in a Video Haystack	May 26, 2025	Video UnderstandingVisual Grounding	—Unverified
Two-Stream Transformer Architecture for Long Video Understanding	Aug 2, 2022	Action RecognitionGPU	—Unverified
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Nov 29, 2021	Boundary DetectionContrastive Learning	—Unverified
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection	Jan 1, 2022	Boundary DetectionContrastive Learning	—Unverified
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges	Sep 25, 2023	Anomaly DetectionDense Video Captioning	—Unverified
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks	Mar 24, 2025	Common Sense ReasoningPrediction	—Unverified
Understanding Action Sequences based on Video Captioning for Learning-from-Observation	Dec 9, 2020	Video CaptioningVideo Understanding	—Unverified
Understanding Long Videos via LLM-Powered Entity Relation Graphs	Jan 27, 2025	EgoSchemaLarge Language Model	—Unverified
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation	Apr 10, 2021	Objectobject-detection	—Unverified
UniDual: A Unified Model for Image and Video Understanding	Jun 10, 2019	Multi-Task LearningVideo Understanding	—Unverified
Unified Graph Structured Models for Video Understanding	Mar 29, 2021	Action DetectionGraph Classification	—Unverified
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action	Jan 1, 2024	Image GenerationInstruction Following	—Unverified
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding	Jan 1, 2023	Video Understanding	—Unverified
Universal Visuo-Tactile Video Understanding for Embodied Interaction	May 28, 2025	FrictionLarge Language Model	—Unverified
Unsupervised Motion Representation Enhanced Network for Action Recognition	Mar 5, 2021	Action RecognitionOptical Flow Estimation	—Unverified
Unsupervised Object Discovery and Tracking in Video Collections	May 14, 2015	ObjectObject Discovery	—Unverified
Unsupervised Video Understanding by Reconciliation of Posture Similarities	Aug 3, 2017	Action ClassificationRetrieval	—Unverified
Human Gaze Guided Attention for Surgical Activity Recognition	Mar 9, 2022	Activity RecognitionVideo Understanding	—Unverified
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers	Mar 14, 2025	GPUMamba	—Unverified
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding	Dec 4, 2023	Language ModelingLanguage Modelling	—Unverified
VCA: Video Curious Agent for Long Video Understanding	Dec 12, 2024	Video Understanding	—Unverified
Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection	May 14, 2023	Image Reconstructionvehicle detection	—Unverified

Show:10 25 50

← PrevPage 19 of 23Next →

No leaderboard results yet.