| PADL: Language-Directed Physics-Based Character Control | Jan 31, 2023 | Image GenerationImitation Learning | CodeCode Available | 1 |
| GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities | Jan 11, 2023 | Multiple-choice | CodeCode Available | 1 |
| Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text | Jan 8, 2023 | Contrastive LearningLogical Reasoning | CodeCode Available | 1 |
| GPT Takes the Bar Exam | Dec 29, 2022 | Hyperparameter OptimizationMultiple-choice | CodeCode Available | 1 |
| Large Language Models Encode Clinical Knowledge | Dec 26, 2022 | Clinical KnowledgeMedQA | CodeCode Available | 1 |
| Training Trajectories of Language Models Across Scales | Dec 19, 2022 | In-Context LearningMultiple-choice | CodeCode Available | 1 |
| Evaluating the Knowledge Dependency of Questions | Nov 21, 2022 | Multiple-choice | CodeCode Available | 1 |
| Leveraging Large Language Models for Multiple Choice Question Answering | Oct 22, 2022 | Answer SelectionMultiple-choice | CodeCode Available | 1 |
| EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain | Oct 12, 2022 | Distractor GenerationMultiple-choice | CodeCode Available | 1 |
| Variational Open-Domain Question Answering | Sep 23, 2022 | Language ModellingMedQA | CodeCode Available | 1 |
| Can large language models reason about medical questions? | Jul 17, 2022 | MedQAMultiple-choice | CodeCode Available | 1 |
| CC-Riddle: A Question Answering Dataset of Chinese Character Riddles | Jun 28, 2022 | General KnowledgeLanguage Modelling | CodeCode Available | 1 |
| SQuALITY: Building a Long-Document Summarization Dataset the Hard Way | May 23, 2022 | Document SummarizationMultiple-choice | CodeCode Available | 1 |
| FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue | May 12, 2022 | Dialogue UnderstandingDomain Adaptation | CodeCode Available | 1 |
| Clues Before Answers: Generation-Enhanced Multiple-Choice QA | Apr 30, 2022 | DecoderMultiple-choice | CodeCode Available | 1 |
| AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension | Mar 16, 2022 | Logical ReasoningMachine Reading Comprehension | CodeCode Available | 1 |
| Leaf: Multiple-Choice Question Generation | Jan 22, 2022 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 |
| Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation | Dec 22, 2021 | AttributeConversational Recommendation | CodeCode Available | 1 |
| QuALITY: Question Answering with Long Input Texts, Yes! | Dec 16, 2021 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 1 |
| Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right | Nov 1, 2021 | FormMultiple-choice | CodeCode Available | 1 |
| MixQG: Neural Question Generation with Mixed Answer Types | Oct 15, 2021 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| A Few More Examples May Be Worth Billions of Parameters | Oct 8, 2021 | Extractive Question-AnsweringMultiple-choice | CodeCode Available | 1 |
| An MRC Framework for Semantic Role Labeling | Sep 14, 2021 | Computational EfficiencyMachine Reading Comprehension | CodeCode Available | 1 |
| ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization | Sep 9, 2021 | Abstractive Text SummarizationDecoder | CodeCode Available | 1 |