| CSEPrompts: A Benchmark of Introductory Computer Science Prompts | Apr 3, 2024 | Multiple-choice | CodeCode Available | 0 |
| Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Apr 2, 2024 | Distractor GenerationIn-Context Learning | CodeCode Available | 0 |
| AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles | Apr 1, 2024 | Common Sense ReasoningMultiple-choice | CodeCode Available | 0 |
| Can multiple-choice questions really be useful in detecting the abilities of LLMs? | Mar 26, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Pragmatic Competence Evaluation of Large Language Models for the Korean Language | Mar 19, 2024 | Few-Shot LearningMultiple-choice | CodeCode Available | 0 |
| LHMKE: A Large-scale Holistic Multi-subject Knowledge Evaluation Benchmark for Chinese Large Language Models | Mar 19, 2024 | Multiple-choice | —Unverified | 0 |
| Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering | Mar 17, 2024 | Event Causality IdentificationMultiple-choice | —Unverified | 0 |
| Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models | Mar 15, 2024 | Few-Shot Image Classificationimage-classification | —Unverified | 0 |
| EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models | Mar 15, 2024 | MiscellaneousMultiple-choice | CodeCode Available | 0 |
| Towards Diverse Perspective Learning with Selection over Multiple Temporal Poolings | Mar 14, 2024 | Multiple-choiceTime Series | CodeCode Available | 0 |