| Evaluating LLMs on Document-Based QA: Exact Answer Selection and Numerical Extraction using Cogtale dataset | Nov 14, 2023 | Answer SelectionInformation Retrieval | —Unverified | 0 |
| It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination Reasoning | Nov 13, 2023 | Multiple-choice | CodeCode Available | 0 |
| Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks | Nov 9, 2023 | Multiple-choiceWorld Knowledge | —Unverified | 0 |
| Assessing Distractors in Multiple-Choice Tests | Nov 8, 2023 | DiversityMultiple-choice | —Unverified | 0 |
| Evaluating multiple large language models in pediatric ophthalmology | Nov 7, 2023 | Multiple-choice | —Unverified | 0 |
| Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions | Nov 5, 2023 | Logical ReasoningMultiple-choice | —Unverified | 0 |
| More Robots are Coming: Large Multimodal Models (ChatGPT) can Solve Visually Diverse Images of Parsons Problems | Nov 3, 2023 | Multiple-choice | —Unverified | 0 |
| CASE: Commonsense-Augmented Score with an Expanded Answer Space | Nov 3, 2023 | Multiple-choice | CodeCode Available | 0 |
| DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding | Oct 24, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| POE: Process of Elimination for Multiple Choice Reasoning | Oct 24, 2023 | In-Context LearningLogical Reasoning | CodeCode Available | 0 |