SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 901950 of 1149 papers

TitleStatusHype
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding0
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding0
Only Time Can Tell: Discovering Temporal Data for Temporal Modeling0
On the Limitations of Vision-Language Models in Understanding Image Transforms0
Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting0
Open Vocabulary Multi-Label Video Classification0
Open-Vocabulary Spatio-Temporal Action Detection0
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering0
Overview of Tencent Multi-modal Ads Video Understanding Challenge0
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering0
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders0
Time Blindness: Why Video-Language Models Can't See What Humans Can?0
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
TIME: Temporal-sensitive Multi-dimensional Instruction Tuning and Benchmarking for Video-LLMs0
Toward a Human-Level Video Understanding Intelligence0
Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
Towards Fine-Grained Video Question Answering0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset0
Towards Long Video Understanding via Fine-detailed Video Story Generation0
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition0
Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition0
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection0
Transformed ROIs for Capturing Visual Transformations in Videos0
Transition Is a Process: Pair-to-Video Change Detection Networks for Very High Resolution Remote Sensing Images0
TVBench: Redesigning Video-Language Evaluation0
TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning0
Two Causally Related Needles in a Video Haystack0
Two-Stream Transformer Architecture for Long Video Understanding0
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
UBoCo: Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges0
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks0
Understanding Action Sequences based on Video Captioning for Learning-from-Observation0
Understanding Long Videos via LLM-Powered Entity Relation Graphs0
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation0
UniDual: A Unified Model for Image and Video Understanding0
Unified Graph Structured Models for Video Understanding0
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action0
UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding0
Universal Visuo-Tactile Video Understanding for Embodied Interaction0
Unsupervised Motion Representation Enhanced Network for Action Recognition0
Unsupervised Object Discovery and Tracking in Video Collections0
Unsupervised Video Understanding by Reconciliation of Posture Similarities0
Human Gaze Guided Attention for Surgical Activity Recognition0
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers0
VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding0
VCA: Video Curious Agent for Long Video Understanding0
Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection0
Show:102550
← PrevPage 19 of 23Next →

No leaderboard results yet.