| Increasing Probability Mass on Answer Choices Does Not Always Improve Accuracy | May 24, 2023 | In-Context LearningMultiple-choice | CodeCode Available | 0 | 5 |
| How much do LLMs learn from negative examples? | Mar 18, 2025 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| How Can We Diagnose and Treat Bias in Large Language Models for Clinical Decision-Making? | Oct 21, 2024 | counterfactualDecision Making | CodeCode Available | 0 | 5 |
| Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options | Aug 27, 2024 | Decision MakingMultiple-choice | CodeCode Available | 0 | 5 |
| IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs | Nov 12, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 | 5 |
| Question-Aware Knowledge Graph Prompting for Enhancing Large Language Models | Mar 30, 2025 | Knowledge GraphsMultiple-choice | CodeCode Available | 0 | 5 |
| Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment | Mar 3, 2024 | Cloze TestMultiple-choice | —Unverified | 0 | 0 |
| Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research | Apr 30, 2023 | MarketingMultiple-choice | —Unverified | 0 | 0 |
| Context Modeling with Evidence Filter for Multiple Choice Question Answering | Oct 6, 2020 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Context-guided Triple Matching for Multiple Choice Question Answering | Jan 16, 2022 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |