| Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities | Feb 20, 2024 | Multiple-choiceText Simplification | —Unverified | 0 |
| BiMediX: Bilingual Medical Mixture of Experts LLM | Feb 20, 2024 | Mixture-of-ExpertsMultiple-choice | CodeCode Available | 1 |
| ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic | Feb 20, 2024 | ArabicMMLULanguage Model Evaluation | CodeCode Available | 1 |
| Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A | Feb 20, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Stick to your Role! Stability of Personal Values Expressed in Large Language Models | Feb 19, 2024 | Multiple-choice | —Unverified | 0 |
| Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? | Feb 19, 2024 | Decision MakingMemorization | CodeCode Available | 0 |
| Uncertainty quantification in fine-tuned LLMs using LoRA ensembles | Feb 19, 2024 | Multiple-choiceUncertainty Quantification | CodeCode Available | 0 |
| KMMLU: Measuring Massive Multitask Language Understanding in Korean | Feb 18, 2024 | kmmluLanguage Model Evaluation | —Unverified | 0 |
| Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering | Feb 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| DE-COP: Detecting Copyrighted Content in Language Models Training Data | Feb 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |