| SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios | Mar 8, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning | Feb 8, 2025 | Legal ReasoningMultiple-choice | CodeCode Available | 0 | 5 |
| A large language model-assisted education tool to provide feedback on open-ended responses | Jul 25, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Questioning the Survey Responses of Large Language Models | Jun 13, 2023 | Multiple-choiceSurvey | CodeCode Available | 0 | 5 |
| Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit Scales | Oct 2, 2024 | Multiple-choice | CodeCode Available | 0 | 5 |
| IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models | Jun 18, 2024 | ManagementMultiple-choice | CodeCode Available | 0 | 5 |
| Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQs | Jan 10, 2025 | Multiple-choice | CodeCode Available | 0 | 5 |
| Introducing a framework to assess newly created questions with Natural Language Processing | Apr 28, 2020 | Multiple-choice | CodeCode Available | 0 | 5 |
| iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers | May 25, 2024 | Common Sense ReasoningMultiple-choice | CodeCode Available | 0 | 5 |
| Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models | Apr 11, 2024 | Multiple-choiceReading Comprehension | CodeCode Available | 0 | 5 |