| Context Modeling with Evidence Filter for Multiple Choice Question Answering | Oct 6, 2020 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 |
| Context-guided Triple Matching for Multiple Choice Question Answering | Jan 16, 2022 | BenchmarkingMultiple-choice | —Unverified | 0 |
| AstroMLab 1: Who Wins Astronomy Jeopardy!? | Jul 15, 2024 | AstronomyBenchmarking | —Unverified | 0 |
| DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Jan 8, 2025 | Multimodal ReasoningMultiple-choice | —Unverified | 0 |
| Context-guided Triple Matching for Multiple Choice Question Answering | Sep 27, 2021 | BenchmarkingMultiple-choice | —Unverified | 0 |
| A statistical model for aggregating judgments by incorporating peer predictions | Mar 14, 2017 | counterfactualMultiple-choice | —Unverified | 0 |
| Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III | Jun 29, 2025 | Model SelectionMultiple-choice | —Unverified | 0 |
| Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models | Oct 18, 2024 | FairnessMultiple-choice | —Unverified | 0 |
| DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension | Mar 1, 2019 | Dialogue UnderstandingMultiple-choice | —Unverified | 0 |
| DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction | Mar 19, 2020 | Multiple-choicePrediction | —Unverified | 0 |