| Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare | Oct 24, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization | May 30, 2025 | FormLanguage Modeling | —Unverified | 0 | 0 |
| Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | Feb 21, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs | Feb 18, 2025 | Generative Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing | Oct 14, 2024 | AllBinary Classification | —Unverified | 0 | 0 |
| The impact of AI and peer feedback on research writing skills: a study using the CGScholar platform among Kazakhstani scholars | Mar 5, 2025 | Multiple-choiceSurvey | —Unverified | 0 | 0 |
| LLMs May Perform MCQA by Selecting the Least Incorrect Option | Feb 2, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions | Oct 24, 2020 | General ClassificationMultiple-choice | —Unverified | 0 | 0 |
| ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions | Feb 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination | Sep 19, 2024 | General KnowledgeMMLU | —Unverified | 0 | 0 |