| None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks | Feb 18, 2025 | MathMemorization | —Unverified | 0 | 0 |
| No Task Left Behind: Multi-Task Learning of Knowledge Tracing and Option Tracing for Better Student Assessment | Apr 8, 2022 | Knowledge TracingMultiple-choice | —Unverified | 0 | 0 |
| Note on Combinatorial Engineering Frameworks for Hierarchical Modular Systems | Mar 29, 2013 | Combinatorial OptimizationMultiple-choice | —Unverified | 0 | 0 |
| Note on Evolution and Forecasting of Requirements: Communications Example | May 22, 2017 | Multiple-choice | —Unverified | 0 | 0 |
| Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning | Aug 30, 2024 | Causal Language ModelingContinual Learning | —Unverified | 0 | 0 |
| NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models | Jul 15, 2024 | Common Sense ReasoningMultiple-choice | —Unverified | 0 | 0 |
| Objective quantification of mood states using large language models | Feb 13, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities | Feb 18, 2025 | Large Language ModelMultiple-choice | —Unverified | 0 | 0 |
| OLMES: A Standard for Language Model Evaluations | Jun 12, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| OmniEval: A Benchmark for Evaluating Omni-modal Models with Visual, Auditory, and Textual Inputs | Jun 26, 2025 | DiversityMultiple-choice | —Unverified | 0 | 0 |