| Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations | Aug 22, 2024 | Multiple-choice | —Unverified | 0 |
| Large Language Models Could Be Rote Learners | Apr 11, 2025 | MemorizationMMLU | —Unverified | 0 |
| Understanding Dataset Design Choices for Multi-hop Reasoning | Apr 27, 2019 | Multi-hop Question AnsweringMultiple-choice | —Unverified | 0 |
| Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code | Mar 9, 2023 | Multiple-choice | —Unverified | 0 |
| Large Language Models Often Know When They Are Being Evaluated | May 28, 2025 | MMLUMultiple-choice | —Unverified | 0 |
| Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | Aug 22, 2023 | Multiple-choiceSensitivity | —Unverified | 0 |
| Large Language Models Still Exhibit Bias in Long Text | Oct 23, 2024 | FairnessMultiple-choice | —Unverified | 0 |
| A Comparative Study of Open-Source Large Language Models, GPT-4 and Claude 2: Multiple-Choice Test Taking in Nephrology | Aug 9, 2023 | Multiple-choice | —Unverified | 0 |
| Understanding Prior Bias and Choice Paralysis in Transformer-based Language Representation Models through Four Experimental Probes | Oct 3, 2022 | Decision MakingMultiple-choice | —Unverified | 0 |
| Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation | Mar 14, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |