| Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions | May 6, 2024 | Decision MakingMultiple-choice | CodeCode Available | 0 |
| WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning | May 6, 2024 | Multiple-choiceVideo Understanding | —Unverified | 0 |
| Math Multiple Choice Question Generation via Human-Large Language Model Collaboration | May 1, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models | Apr 29, 2024 | Common Sense ReasoningMultiple-choice | —Unverified | 0 |
| From Multiple-Choice to Extractive QA: A Case Study for English and Arabic | Apr 26, 2024 | BelebeleExtractive Question-Answering | CodeCode Available | 0 |
| How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Apr 25, 2024 | 4kLanguage Modeling | —Unverified | 0 |
| TAXI: Evaluating Categorical Knowledge Editing for Language Models | Apr 23, 2024 | knowledge editingMultiple-choice | CodeCode Available | 0 |
| AI and Machine Learning for Next Generation Science Assessments | Apr 23, 2024 | Multiple-choice | —Unverified | 0 |
| UnibucLLM: Harnessing LLMs for Automated Prediction of Item Difficulty and Response Time for Multiple-Choice Questions | Apr 20, 2024 | Data AugmentationMultiple-choice | CodeCode Available | 0 |
| Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank | Apr 19, 2024 | Distractor GenerationMath | —Unverified | 0 |