| Uncertainty quantification in fine-tuned LLMs using LoRA ensembles | Feb 19, 2024 | Multiple-choiceUncertainty Quantification | CodeCode Available | 0 |
| KMMLU: Measuring Massive Multitask Language Understanding in Korean | Feb 18, 2024 | kmmluLanguage Model Evaluation | —Unverified | 0 |
| Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering | Feb 16, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| DE-COP: Detecting Copyrighted Content in Language Models Training Data | Feb 15, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Prompting Implicit Discourse Relation Annotation | Feb 7, 2024 | ClassificationImplicit Discourse Relation Classification | —Unverified | 0 |
| SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark | Feb 6, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification | Feb 6, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Enhancing textual textbook question answering with large language models and retrieval augmented generation | Feb 5, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| LLMs May Perform MCQA by Selecting the Least Incorrect Option | Feb 2, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation | Feb 2, 2024 | Distractor GenerationMultiple-choice | —Unverified | 0 |