| Grade Score: Quantifying LLM Performance in Option Selection | Jun 17, 2024 | Decision MakingFairness | CodeCode Available | 0 |
| Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think | Apr 12, 2024 | Multiple-choice | CodeCode Available | 0 |
| This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs | Mar 7, 2025 | Large Language ModelMultiple-choice | CodeCode Available | 0 |
| StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding | Oct 19, 2023 | Multiple-choiceNatural Language Understanding | CodeCode Available | 0 |
| Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora | May 13, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 |
| From Multiple-Choice to Extractive QA: A Case Study for English and Arabic | Apr 26, 2024 | BelebeleExtractive Question-Answering | CodeCode Available | 0 |
| ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning | Feb 7, 2025 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors | Jun 3, 2024 | Multiple-choiceSelection bias | CodeCode Available | 0 |
| QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option Shuffling | Sep 21, 2024 | Multiple-choicePrompt Engineering | CodeCode Available | 0 |
| Truth Knows No Language: Evaluating Truthfulness Beyond English | Feb 13, 2025 | InformativenessMachine Translation | CodeCode Available | 0 |