| Make a Choice! Knowledge Base Question Answering with In-Context Learning | May 23, 2023 | In-Context LearningKnowledge Base Question Answering | —Unverified | 0 |
| Query Rewriting for Retrieval-Augmented Large Language Models | May 23, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| NarrativeXL: A Large-scale Dataset For Long-Term Memory Models | May 23, 2023 | Multiple-choiceReading Comprehension | CodeCode Available | 1 |
| Iterative Forward Tuning Boosts In-Context Learning in Language Models | May 22, 2023 | Decision MakingIn-Context Learning | CodeCode Available | 0 |
| VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models | May 20, 2023 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models | May 17, 2023 | Instruction FollowingMultiple-choice | CodeCode Available | 1 |
| A quantitative study of NLP approaches to question difficulty estimation | May 17, 2023 | MathMultiple-choice | CodeCode Available | 0 |
| C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models | May 15, 2023 | Multiple-choice | CodeCode Available | 3 |
| EMBRACE: Evaluation and Modifications for Boosting RACE | May 15, 2023 | Machine Reading ComprehensionMultiple-choice | CodeCode Available | 0 |
| Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | May 7, 2023 | Multiple-choice | CodeCode Available | 1 |
| MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic | May 5, 2023 | Epistemic ReasoningLanguage Modeling | CodeCode Available | 1 |
| Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research | Apr 30, 2023 | MarketingMultiple-choice | —Unverified | 0 |
| Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers | Apr 21, 2023 | MathMultiple-choice | —Unverified | 0 |
| Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies | Apr 15, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning | Apr 14, 2023 | Multiple-choicePrompt Engineering | —Unverified | 0 |
| DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach | Apr 10, 2023 | Distractor GenerationMachine Translation | —Unverified | 0 |
| FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain | Apr 9, 2023 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 0 |
| Bridging the Language Gap: Knowledge Injected Multilingual Question Answering | Apr 6, 2023 | Cross-Lingual TransferExtractive Question-Answering | —Unverified | 0 |
| GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam | Apr 4, 2023 | Multiple-choice | —Unverified | 0 |
| A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education | Mar 31, 2023 | ArticlesMachine Reading Comprehension | CodeCode Available | 0 |
| Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams | Mar 29, 2023 | Multiple-choice | CodeCode Available | 1 |
| Explicit Planning Helps Language Models in Logical Reasoning | Mar 28, 2023 | Logical ReasoningMultiple-choice | CodeCode Available | 1 |
| Automatic Generation of Multiple-Choice Questions | Mar 25, 2023 | Multiple-choicePart-Of-Speech Tagging | —Unverified | 0 |
| A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering | Mar 18, 2023 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses? | Mar 16, 2023 | Multiple-choice | —Unverified | 0 |