| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 | 5 |
| Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models | Mar 20, 2025 | Multiple-choiceVideo Understanding | CodeCode Available | 1 | 5 |
| INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance | Jun 13, 2024 | Multiple-choiceVisual Reasoning | CodeCode Available | 1 | 5 |
| An MRC Framework for Semantic Role Labeling | Sep 14, 2021 | Computational EfficiencyMachine Reading Comprehension | CodeCode Available | 1 | 5 |
| African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | Jun 20, 2024 | BenchmarkingClassification | CodeCode Available | 1 | 5 |
| An In-depth Look at Gemini's Language Abilities | Dec 18, 2023 | Instruction FollowingMath | CodeCode Available | 1 | 5 |
| General-Purpose Question-Answering with Macaw | Sep 6, 2021 | Generative Question AnsweringMultiple-choice | CodeCode Available | 1 | 5 |
| Generating Distractors for Reading Comprehension Questions from Real Examinations | Sep 8, 2018 | DecoderDistractor Generation | CodeCode Available | 1 | 5 |
| GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities | Jan 11, 2023 | Multiple-choice | CodeCode Available | 1 | 5 |
| FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture | Jun 16, 2024 | DiversityMultiple-choice | CodeCode Available | 1 | 5 |