SOTAVerified

VCGBench-Diverse

Recognizing the limited diversity in existing video conversation benchmarks, we introduce VCGBench-Diverse to comprehensively evaluate the generalization ability of video LMMs. While VCG-Bench provides an extensive evaluation protocol, it is limited to videos from the ActivityNet200 dataset. Our benchmark comprises a total of 877 videos, 18 broad video categories and 4,354 QA pairs, ensuring a robust evaluation framework.

The evaluation is computed over five different aspects:

  1. Correctness of information

  2. Detail orientation

  3. Contextual understanding

  4. Temporal understanding

  5. Consistency.

Additionally, VCGBench-Diverse provides a breakdown of performance across three key aspects:

  1. Dense video captioning, which assesses the ability to generate detailed and accurate descriptions of the video content,

  2. Spatial understanding, which evaluates the capability to understand and describe the spatial relationships and settings within the video

  3. Reasoning, which tests the adeptness in inferring and explaining causal relationships and actions within the video.

Papers

Showing 15 of 5 papers

TitleStatusHype
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language ModelsCode3
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video UnderstandingCode3
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkCode2
VTimeLLM: Empower LLM to Grasp Video MomentsCode2
Show:102550

No leaderboard results yet.