| AILS-NTUA at SemEval-2024 Task 9: Cracking Brain Teasers: Transformer Models for Lateral Thinking Puzzles | Apr 1, 2024 | Common Sense ReasoningMultiple-choice | CodeCode Available | 0 |
| DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors | May 29, 2025 | MMLUMultiple-choice | CodeCode Available | 0 |
| EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants | Feb 27, 2025 | Multiple-choice | CodeCode Available | 0 |
| MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension | Oct 1, 2019 | Logical ReasoningMachine Reading Comprehension | CodeCode Available | 0 |
| Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions | May 6, 2024 | Decision MakingMultiple-choice | CodeCode Available | 0 |
| MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models | Dec 10, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Pragmatic Competence Evaluation of Large Language Models for the Korean Language | Mar 19, 2024 | Few-Shot LearningMultiple-choice | CodeCode Available | 0 |
| Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks? | Apr 1, 2017 | Information RetrievalMultiple-choice | CodeCode Available | 0 |
| Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models | Sep 19, 2024 | EthicsMultiple-choice | CodeCode Available | 0 |
| Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning | Feb 8, 2025 | Legal ReasoningMultiple-choice | CodeCode Available | 0 |