| PADL: Language-Directed Physics-Based Character Control | Jan 31, 2023 | Image GenerationImitation Learning | CodeCode Available | 1 |
| GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities | Jan 11, 2023 | Multiple-choice | CodeCode Available | 1 |
| Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text | Jan 8, 2023 | Contrastive LearningLogical Reasoning | CodeCode Available | 1 |
| GPT Takes the Bar Exam | Dec 29, 2022 | Hyperparameter OptimizationMultiple-choice | CodeCode Available | 1 |
| Large Language Models Encode Clinical Knowledge | Dec 26, 2022 | Clinical KnowledgeMedQA | CodeCode Available | 1 |
| Training Trajectories of Language Models Across Scales | Dec 19, 2022 | In-Context LearningMultiple-choice | CodeCode Available | 1 |
| Evaluating the Knowledge Dependency of Questions | Nov 21, 2022 | Multiple-choice | CodeCode Available | 1 |
| Leveraging Large Language Models for Multiple Choice Question Answering | Oct 22, 2022 | Answer SelectionMultiple-choice | CodeCode Available | 1 |
| EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain | Oct 12, 2022 | Distractor GenerationMultiple-choice | CodeCode Available | 1 |
| Variational Open-Domain Question Answering | Sep 23, 2022 | Language ModellingMedQA | CodeCode Available | 1 |
| Can large language models reason about medical questions? | Jul 17, 2022 | MedQAMultiple-choice | CodeCode Available | 1 |
| CC-Riddle: A Question Answering Dataset of Chinese Character Riddles | Jun 28, 2022 | General KnowledgeLanguage Modelling | CodeCode Available | 1 |
| SQuALITY: Building a Long-Document Summarization Dataset the Hard Way | May 23, 2022 | Document SummarizationMultiple-choice | CodeCode Available | 1 |
| FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue | May 12, 2022 | Dialogue UnderstandingDomain Adaptation | CodeCode Available | 1 |
| Clues Before Answers: Generation-Enhanced Multiple-Choice QA | Apr 30, 2022 | DecoderMultiple-choice | CodeCode Available | 1 |
| AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension | Mar 16, 2022 | Logical ReasoningMachine Reading Comprehension | CodeCode Available | 1 |
| Leaf: Multiple-Choice Question Generation | Jan 22, 2022 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| Bridging Video-text Retrieval with Multiple Choice Questions | Jan 13, 2022 | Action RecognitionLinear evaluation | CodeCode Available | 1 |
| Multiple Choice Questions based Multi-Interest Policy Learning for Conversational Recommendation | Dec 22, 2021 | AttributeConversational Recommendation | CodeCode Available | 1 |
| QuALITY: Question Answering with Long Input Texts, Yes! | Dec 16, 2021 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 1 |
| Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right | Nov 1, 2021 | FormMultiple-choice | CodeCode Available | 1 |
| MixQG: Neural Question Generation with Mixed Answer Types | Oct 15, 2021 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| A Few More Examples May Be Worth Billions of Parameters | Oct 8, 2021 | Extractive Question-AnsweringMultiple-choice | CodeCode Available | 1 |
| An MRC Framework for Semantic Role Labeling | Sep 14, 2021 | Computational EfficiencyMachine Reading Comprehension | CodeCode Available | 1 |
| ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization | Sep 9, 2021 | Abstractive Text SummarizationDecoder | CodeCode Available | 1 |
| General-Purpose Question-Answering with Macaw | Sep 6, 2021 | Generative Question AnsweringMultiple-choice | CodeCode Available | 1 |
| TIMEDIAL: Temporal Commonsense Reasoning in Dialog | Jun 8, 2021 | Multiple-choiceTimedial | CodeCode Available | 1 |
| NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation | May 30, 2021 | Dialogue State TrackingMultiple-choice | CodeCode Available | 1 |
| Option Tracing: Beyond Correctness Analysis in Knowledge Tracing | Apr 19, 2021 | Knowledge TracingMultiple-choice | CodeCode Available | 1 |
| When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset | Apr 18, 2021 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| Surface Form Competition: Why the Highest Probability Answer Isn't Always Right | Apr 16, 2021 | FormMultiple-choice | CodeCode Available | 1 |
| What to Pre-Train on? Efficient Intermediate Task Selection | Apr 16, 2021 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning | Apr 15, 2021 | Graph GenerationMultiple-choice | CodeCode Available | 1 |
| Quiz-Style Question Generation for News Stories | Feb 18, 2021 | Answer GenerationDistractor Generation | CodeCode Available | 1 |
| TSQA: Tabular Scenario Based Question Answering | Jan 14, 2021 | Machine Reading ComprehensionMultiple-choice | CodeCode Available | 1 |
| Explaining NLP Models via Minimal Contrastive Editing (MiCE) | Dec 27, 2020 | counterfactualMultiple-choice | CodeCode Available | 1 |
| Option Tracing: Beyond Binary Knowledge Tracing | Dec 11, 2020 | Knowledge TracingMultiple-choice | CodeCode Available | 1 |
| IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages | Nov 8, 2020 | Genre classificationMultiple-choice | CodeCode Available | 1 |
| A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies. | Nov 1, 2020 | Distractor GenerationMultiple-choice | CodeCode Available | 1 |
| Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning | Oct 26, 2020 | ClusteringModel-based Reinforcement Learning | CodeCode Available | 1 |
| Counterfactual Variable Control for Robust and Interpretable Question Answering | Oct 12, 2020 | Causal Inferencecounterfactual | CodeCode Available | 1 |
| A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies | Oct 12, 2020 | Distractor GenerationMultiple-choice | CodeCode Available | 1 |
| FarsTail: A Persian Natural Language Inference Dataset | Sep 18, 2020 | Multiple-choiceNatural Language Inference | CodeCode Available | 1 |
| Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze Reward | May 3, 2020 | Abstractive Text SummarizationCloze Test | CodeCode Available | 1 |
| UnifiedQA: Crossing Format Boundaries With a Single QA System | May 2, 2020 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 1 |
| LifeQA: A Real-life Dataset for Video Question Answering | May 1, 2020 | Multiple-choiceQuestion Answering | CodeCode Available | 1 |
| Simulated Annealing Algorithm for the Multiple Choice Multidimensional Knapsack Problem | May 1, 2020 | AllC++ code | CodeCode Available | 1 |
| STARC: Structured Annotations for Reading Comprehension | Apr 30, 2020 | Multiple-choiceReading Comprehension | CodeCode Available | 1 |
| Logic-Guided Data Augmentation and Regularization for Consistent Question Answering | Apr 21, 2020 | Data AugmentationMachine Reading Comprehension | CodeCode Available | 1 |
| From Machine Reading Comprehension to Dialogue State Tracking: Bridging the Gap | Apr 13, 2020 | Dialogue State TrackingMachine Reading Comprehension | CodeCode Available | 1 |