| Biomedical Entity Linking as Multiple Choice Question Answering | Feb 23, 2024 | Entity LinkingMultiple-choice | CodeCode Available | 0 |
| "My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models | Feb 22, 2024 | Multiple-choiceText Generation | CodeCode Available | 0 |
| Identifying Multiple Personalities in Large Language Models with External Evaluation | Feb 22, 2024 | Multiple-choice | —Unverified | 0 |
| Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | Feb 21, 2024 | Multiple-choice | —Unverified | 0 |
| Ranking Large Language Models without Ground Truth | Feb 21, 2024 | Multiple-choiceTriplet | —Unverified | 0 |
| KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge | Feb 21, 2024 | 4kMultiple-choice | —Unverified | 0 |
| Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A | Feb 20, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities | Feb 20, 2024 | Multiple-choiceText Simplification | —Unverified | 0 |
| Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? | Feb 19, 2024 | Decision MakingMemorization | CodeCode Available | 0 |
| Stick to your Role! Stability of Personal Values Expressed in Large Language Models | Feb 19, 2024 | Multiple-choice | —Unverified | 0 |