| Enhancing LLM Evaluations: The Garbling Trick | Nov 3, 2024 | Multiple-choice | —Unverified | 0 |
| Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework | Nov 16, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Answering Chinese Elementary School Social Study Multiple Choice Questions | Jun 26, 2021 | Multiple-choiceNegation | —Unverified | 0 |
| Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering | Mar 17, 2024 | Event Causality IdentificationMultiple-choice | —Unverified | 0 |
| Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration | Jun 19, 2024 | BenchmarkingDistractor Generation | —Unverified | 0 |
| Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination | Sep 19, 2024 | General KnowledgeMMLU | —Unverified | 0 |
| First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge | Sep 20, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| First Token Probability Guided RAG for Telecom Question Answering | Jan 11, 2025 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| AGReE: A system for generating Automated Grammar Reading Exercises | Oct 28, 2022 | ArticlesMultiple-choice | —Unverified | 0 |
| HANS, are you clever? Clever Hans Effect Analysis of Neural Systems | Sep 21, 2023 | Decision MakingMultiple-choice | —Unverified | 0 |