| Precise Task Formalization Matters in Winograd Schema Evaluations | Oct 8, 2020 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Towards a Unified Multimodal Reasoning Framework | Dec 22, 2023 | Multimodal ReasoningMultiple-choice | CodeCode Available | 0 |
| IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models | Jun 18, 2024 | ManagementMultiple-choice | CodeCode Available | 0 |
| iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers | May 25, 2024 | Common Sense ReasoningMultiple-choice | CodeCode Available | 0 |
| Eliciting Informative Text Evaluations with Large Language Models | May 23, 2024 | Multiple-choicePrediction | CodeCode Available | 0 |
| ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice Questions | Apr 4, 2019 | Multiple-choiceReading Comprehension | CodeCode Available | 0 |
| Self-Recognition in Language Models | Jul 9, 2024 | Multiple-choice | CodeCode Available | 0 |
| EMBRACE: Evaluation and Modifications for Boosting RACE | May 15, 2023 | Machine Reading ComprehensionMultiple-choice | CodeCode Available | 0 |
| Can multiple-choice questions really be useful in detecting the abilities of LLMs? | Mar 26, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment | Jul 20, 2024 | Contrastive LearningMultiple-choice | CodeCode Available | 0 |