SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 901925 of 1149 papers

TitleStatusHype
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding0
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding0
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling0
On the Limitations of Vision-Language Models in Understanding Image Transforms0
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting0
Open Vocabulary Multi-Label Video Classification0
Open-Vocabulary Spatio-Temporal Action Detection0
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering0
Overview of Tencent Multi-modal Ads Video Understanding Challenge0
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering0
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders0
Time Blindness: Why Video-Language Models Can't See What Humans Can?0
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs0
Toward a Human-Level Video Understanding Intelligence0
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
Towards Fine-Grained Video Question Answering0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset0
Towards Long Video Understanding via Fine-detailed Video Story Generation0
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition0
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition0
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection0
Transformed ROIs for Capturing Visual Transformations in Videos0
Show:102550
← PrevPage 37 of 46Next →

No leaderboard results yet.