VCGBench-Diverse
Recognizing the limited diversity in existing video conversation benchmarks, we introduce VCGBench-Diverse to comprehensively evaluate the generalization ability of video LMMs. While VCG-Bench provides an extensive evaluation protocol, it is limited to videos from the ActivityNet200 dataset. Our benchmark comprises a total of 877 videos, 18 broad video categories and 4,354 QA pairs, ensuring a robust evaluation framework.
The evaluation is computed over five different aspects:
-
Correctness of information
-
Detail orientation
-
Contextual understanding
-
Temporal understanding
-
Consistency.
Additionally, VCGBench-Diverse provides a breakdown of performance across three key aspects:
-
Dense video captioning, which assesses the ability to generate detailed and accurate descriptions of the video content,
-
Spatial understanding, which evaluates the capability to understand and describe the spatial relationships and settings within the video
-
Reasoning, which tests the adeptness in inferring and explaining causal relationships and actions within the video.
Papers
Showing 1–5 of 5 papers
No leaderboard results yet.