SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 701725 of 1149 papers

TitleStatusHype
Rethinking Image-to-Video Adaptation: An Object-centric Perspective0
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation ModelCode0
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding0
KeyVideoLLM: Towards Large-scale Video Keyframe Selection0
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output0
https://arxiv.org/abs/2407.00634Code0
Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs0
Zero-Shot Long-Form Video Understanding through Screenplay0
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models0
video-SALMONN: Speech-Enhanced Audio-Visual Large Language ModelsCode0
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset0
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement0
DrVideo: Document Retrieval Based Long Video Understanding0
Hallucination Mitigation Prompts Long-term Video UnderstandingCode0
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment0
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal ModelCode0
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding0
Localizing Events in Videos with Multimodal Queries0
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living0
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models0
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD0
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation0
Semantic Segmentation on VSPW Dataset through Masked Video Consistency0
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation0
Show:102550
← PrevPage 29 of 46Next →

No leaderboard results yet.