SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–725 of 1149 papers

Title	Date	Tasks	Status
Rethinking Image-to-Video Adaptation: An Object-centric Perspective	Jul 9, 2024	Action RecognitionObject	—Unverified
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model	Jul 9, 2024	Video Understanding	CodeCode Available
OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding	Jul 6, 2024	Video Understanding	—Unverified
KeyVideoLLM: Towards Large-scale Video Keyframe Selection	Jul 3, 2024	Data CompressionManagement	—Unverified
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Jul 3, 2024	ArticlesImage Comprehension	—Unverified
https://arxiv.org/abs/2407.00634	Jul 2, 2024	Video CaptioningVideo Description	CodeCode Available
Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs	Jul 2, 2024	Video Understanding	—Unverified
Zero-Shot Long-Form Video Understanding through Screenplay	Jun 25, 2024	FormQuestion Answering	—Unverified
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models	Jun 24, 2024	HallucinationVideo Understanding	—Unverified
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models	Jun 22, 2024	DiversityLanguage Modeling	CodeCode Available
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	Jun 20, 2024	FormVideo Understanding	—Unverified
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset	Jun 19, 2024	Language ModelingLanguage Modelling	—Unverified
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement	Jun 19, 2024	Video Understanding	—Unverified
DrVideo: Document Retrieval Based Long Video Understanding	Jun 18, 2024	document understandingEgoSchema	—Unverified
Hallucination Mitigation Prompts Long-term Video Understanding	Jun 17, 2024	Answer GenerationHallucination	CodeCode Available
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment	Jun 16, 2024	Action UnderstandingBenchmarking	—Unverified
Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model	Jun 15, 2024	Question AnsweringVideo Understanding	CodeCode Available
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	Jun 14, 2024	Activity RecognitionMMR total	—Unverified
Localizing Events in Videos with Multimodal Queries	Jun 14, 2024	Natural Language QueriesVideo Understanding	—Unverified
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living	Jun 13, 2024	BenchmarkingHuman-Object Interaction Detection	—Unverified
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models	Jun 12, 2024	Video Understanding	—Unverified
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD	Jun 11, 2024	Video RecognitionVideo Understanding	—Unverified
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Jun 8, 2024	BenchmarkingInstance Segmentation	—Unverified
Semantic Segmentation on VSPW Dataset through Masked Video Consistency	Jun 7, 2024	Semantic SegmentationVideo Understanding	—Unverified
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 6, 2024	Panoptic SegmentationSegmentation	—Unverified

Show:10 25 50

← PrevPage 29 of 46Next →

No leaderboard results yet.