| Evaluating Machine Reading Systems through Comprehension Tests | May 1, 2012 | Answer SelectionMultiple-choice | —Unverified | 0 | 0 |
| Evaluating multiple large language models in pediatric ophthalmology | Nov 7, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| Evaluating Nuanced Bias in Large Language Model Free Response Answers | Jul 11, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| Evaluating Question Answering Evaluation | Nov 1, 2019 | Answer GenerationMultiple-choice | —Unverified | 0 | 0 |
| A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults | May 1, 2016 | Multiple-choicePOS | —Unverified | 0 | 0 |
| Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions | Sep 22, 2024 | Band GapIn-Context Learning | —Unverified | 0 | 0 |
| Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions | Nov 5, 2023 | Logical ReasoningMultiple-choice | —Unverified | 0 | 0 |
| Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension | Nov 30, 2023 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education | Oct 18, 2023 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms | Jun 5, 2025 | Multiple-choice | —Unverified | 0 | 0 |