Llama 2: Open Foundation and Fine-Tuned Chat Models Jul 18, 2023 Arithmetic Reasoning
Code Code Available 8Training Compute-Optimal Large Language Models Mar 29, 2022 Anachronisms Analogical Similarity
Code Code Available 6MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Nov 27, 2023 Articles Conditional Text Generation
Code Code Available 4Galactica: A Large Language Model for Science Nov 16, 2022 Anachronisms Bias Detection
Code Code Available 4PaLM: Scaling Language Modeling with Pathways Apr 5, 2022 Auto Debugging Code Generation
Code Code Available 2MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering Mar 27, 2022 Diversity Multiple-choice
Code Code Available 2Scaling Language Models: Methods, Analysis & Insights from Training Gopher Dec 8, 2021 Abstract Algebra Anachronisms
Code Code Available 2AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts May 1, 2024 Multiple Choice Question Answering (MCQA)
Code Code Available 1Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations Oct 2, 2023 In-Context Learning Instruction Following
Code Code Available 1M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models May 17, 2023 Instruction Following Multiple-choice
Code Code Available 1Towards Expert-Level Medical Question Answering with Large Language Models May 16, 2023 Medical Question Answering MedQA
Code Code Available 1Large Language Models Encode Clinical Knowledge Dec 26, 2022 Clinical Knowledge MedQA
Code Code Available 1Leveraging Large Language Models for Multiple Choice Question Answering Oct 22, 2022 Answer Selection Multiple-choice
Code Code Available 1Variational Open-Domain Question Answering Sep 23, 2022 Language Modelling MedQA
Code Code Available 1Can large language models reason about medical questions? Jul 17, 2022 MedQA Multiple-choice
Code Code Available 1Clues Before Answers: Generation-Enhanced Multiple-Choice QA Apr 30, 2022 Decoder Multiple-choice
Code Code Available 1QuALITY: Question Answering with Long Input Texts, Yes! Dec 16, 2021 Multiple-choice Multiple Choice Question Answering (MCQA)
Code Code Available 1LexGLUE: A Benchmark Dataset for Legal Language Understanding in English Oct 3, 2021 Multi-class Classification Multi-Label Classification
Code Code Available 1IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages Nov 8, 2020 Genre classification Multiple-choice
Code Code Available 1Counterfactual Variable Control for Robust and Interpretable Question Answering Oct 12, 2020 Causal Inference counterfactual
Code Code Available 1CP-Router: An Uncertainty-Aware Router Between LLM and LRM May 26, 2025 Conformal Prediction Logical Reasoning
— Unverified 0Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack May 21, 2025 Multiple-choice Multiple Choice Question Answering (MCQA)
— Unverified 0Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information May 9, 2025 Benchmarking Form
— Unverified 0Question-Aware Knowledge Graph Prompting for Enhancing Large Language Models Mar 30, 2025 Knowledge Graphs Multiple-choice
Code Code Available 0Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework Mar 7, 2025 Conformal Prediction Medical Question Answering
— Unverified 0Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning Feb 27, 2025 Math Medical Question Answering
— Unverified 0Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores Feb 22, 2025 Distractor Generation Information Retrieval
Code Code Available 0Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above Feb 19, 2025 All Multiple-choice
— Unverified 0Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning Feb 8, 2025 Legal Reasoning Multiple-choice
Code Code Available 0First Token Probability Guided RAG for Telecom Question Answering Jan 11, 2025 Multiple-choice Multiple Choice Question Answering (MCQA)
— Unverified 0MedG-KRP: Medical Graph Knowledge Representation Probing Dec 14, 2024 Multiple-choice Multiple Choice Question Answering (MCQA)
Code Code Available 0LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering Dec 13, 2024 Few-Shot Learning Knowledge Distillation
— Unverified 0KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting Dec 1, 2024 Multiple-choice Multiple Choice Question Answering (MCQA)
Code Code Available 0SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval Oct 28, 2024 Information Retrieval Multilingual Named Entity Recognition
— Unverified 0Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models Oct 18, 2024 Fairness Multiple-choice
— Unverified 0Differentiating Choices via Commonality for Multiple-Choice Question Answering Aug 21, 2024 Multiple-choice Multiple Choice Question Answering (MCQA)
Code Code Available 0Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions Jul 21, 2024 Multiple-choice Multiple Choice Question Answering (MCQA)
— Unverified 0Long Story Short: Story-level Video Understanding from 20K Short Films Jun 14, 2024 Multiple Choice Question Answering (MCQA) Open-Ended Question Answering
— Unverified 0EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning May 13, 2024 Articles
Code Code Available 0From Multiple-Choice to Extractive QA: A Case Study for English and Arabic Apr 26, 2024 Belebele Extractive Question-Answering
Code Code Available 0Rethinking Generative Large Language Model Evaluation for Semantic Comprehension Mar 12, 2024 Language Model Evaluation Language Modeling
— Unverified 0KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations Mar 3, 2024 MedQA MMLU
— Unverified 0Unsupervised multiple choices question answering via universal corpus Feb 27, 2024 Form Knowledge Graphs
— Unverified 0Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? Feb 19, 2024 Decision Making Memorization
Code Code Available 0LLMs May Perform MCQA by Selecting the Least Incorrect Option Feb 2, 2024 Multiple-choice Multiple Choice Question Answering (MCQA)
— Unverified 0Evaluating the Symbol Binding Ability of Large Language Models for Multiple-Choice Questions in Vietnamese General Education Oct 18, 2023 Multiple-choice Multiple Choice Question Answering (MCQA)
— Unverified 0BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine Aug 18, 2023 Few-Shot Learning Language Modeling
Code Code Available 0FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain Apr 9, 2023 Multiple-choice Multiple Choice Question Answering (MCQA)
Code Code Available 0BloombergGPT: A Large Language Model for Finance Mar 30, 2023 Causal Judgment Common Sense Reasoning
Code Code Available 0Generating multiple-choice questions for medical question answering with distractors and cue-masking Mar 13, 2023 Language Modeling Language Modelling
— Unverified 0