| Answering questions by learning to rank -- Learning to rank by answering questions | Sep 2, 2019 | ARCLearning-To-Rank | —Unverified | 0 | 0 |
| Answering questions by learning to rank - Learning to rank by answering questions | Nov 1, 2019 | ARCLearning-To-Rank | —Unverified | 0 | 0 |
| Answering Questions in Stages: Prompt Chaining for Contract QA | Oct 9, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Answering Science Exam Questions Using Query Rewriting with Background Knowledge | Sep 15, 2018 | ARCInformation Retrieval | —Unverified | 0 | 0 |
| Answering Science Exam Questions Using Query Reformulation with Background Knowledge | Nov 17, 2018 | ARCInformation Retrieval | —Unverified | 0 | 0 |
| Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension | Jan 16, 2022 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension | May 1, 2022 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension | Jan 16, 2020 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments | Nov 28, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models | Jun 18, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques | Nov 2, 2017 | BIG-bench Machine LearningGeneral Classification | —Unverified | 0 | 0 |
| AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects | Dec 31, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic | Mar 14, 2024 | EthicsMultiple-choice | —Unverified | 0 | 0 |
| A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options | Dec 14, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation | May 15, 2025 | InformativenessMultiple-choice | —Unverified | 0 | 0 |
| A review of faithfulness metrics for hallucination assessment in Large Language Models | Dec 31, 2024 | BenchmarkingHallucination | —Unverified | 0 | 0 |
| Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation | Dec 16, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| ARGUS: Hallucination and Omission Evaluation in Video-LLMs | Jun 9, 2025 | DescriptiveForm | —Unverified | 0 | 0 |
| ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition | Oct 8, 2024 | Action RecognitionMultiple-choice | —Unverified | 0 | 0 |
| Aryl: An Elastic Cluster Scheduler for Deep Learning | Feb 16, 2022 | Deep LearningGPU | —Unverified | 0 | 0 |
| A Semantic Feature-Wise Transformation Relation Network for Automatic Short Answer Grading | Nov 1, 2021 | automatic short answer gradingData Augmentation | —Unverified | 0 | 0 |
| A Semantic Parsing Algorithm to Solve Linear Ordering Problems | Feb 12, 2025 | Multiple-choiceSemantic Parsing | —Unverified | 0 | 0 |
| A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs | Jun 11, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment | Apr 19, 2025 | ClassificationMultiple-choice | —Unverified | 0 | 0 |
| Assessing Distractors in Multiple-Choice Tests | Nov 8, 2023 | DiversityMultiple-choice | —Unverified | 0 | 0 |
| Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding | Jan 13, 2024 | Multiple-choicePrompt Engineering | —Unverified | 0 | 0 |
| What Makes Machine Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types | Nov 16, 2021 | Logical ReasoningMachine Reading Comprehension | —Unverified | 0 | 0 |
| A statistical model for aggregating judgments by incorporating peer predictions | Mar 14, 2017 | counterfactualMultiple-choice | —Unverified | 0 | 0 |
| AstroMLab 1: Who Wins Astronomy Jeopardy!? | Jul 15, 2024 | AstronomyBenchmarking | —Unverified | 0 | 0 |
| A System for Generating Multiple Choice Questions: With a Novel Approach for Sentence Selection | Jul 1, 2015 | Active LearningMultiple-choice | —Unverified | 0 | 0 |
| A Theoretically Grounded Benchmark for Evaluating Machine Commonsense | Mar 23, 2022 | Generative Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Attribution analysis of legal language as used by LLM | Jan 28, 2025 | Binary ClassificationMultiple-choice | —Unverified | 0 | 0 |
| Auto-bidding in real-time auctions via Oracle Imitation Learning (OIL) | Dec 16, 2024 | Imitation LearningMultiple-choice | —Unverified | 0 | 0 |
| AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models | Mar 20, 2025 | Autonomous DrivingMultiple-choice | —Unverified | 0 | 0 |
| Auto-Evaluation: A Critical Measure in Driving Improvements in Quality and Safety of AI-Generated Lesson Resources | Jan 23, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| What Large Language Models Know and What People Think They Know | Jan 24, 2024 | ArticlesDecision Making | —Unverified | 0 | 0 |
| Automated Answer Validation using Text Similarity | Jan 13, 2024 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| Answering Chinese Elementary School Social Study Multiple Choice Questions | Jun 26, 2021 | Multiple-choiceNegation | —Unverified | 0 | 0 |
| The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think | May 15, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Automated Generation of Multiple-Choice Cloze Questions for Assessing English Vocabulary Using GPT-turbo 3.5 | Mar 4, 2024 | Multiple-choicePart-Of-Speech Tagging | —Unverified | 0 | 0 |
| Automated Prediction of Examinee Proficiency from Short-Answer Questions | Dec 1, 2020 | Multiple-choicePrediction | —Unverified | 0 | 0 |
| Automatic Distractor Generation for Multiple Choice Questions in Standard Tests | Nov 26, 2020 | Distractor GenerationMultiple-choice | —Unverified | 0 | 0 |
| Automatic Distractor Suggestion for Multiple-Choice Tests Using Concept Embeddings and Information Retrieval | Jun 1, 2018 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| Automatic Generation of Distractors for Fill-in-the-Blank Exercises with Round-Trip Neural Machine Translation | May 1, 2022 | Machine TranslationMultiple-choice | —Unverified | 0 | 0 |
| Automatic Generation of Multiple-Choice Questions | Mar 25, 2023 | Multiple-choicePart-Of-Speech Tagging | —Unverified | 0 | 0 |
| Automatic Question Answering for Medical MCQs: Can It go Further than Information Retrieval? | Sep 1, 2019 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| Automating question generation from educational text | Sep 26, 2023 | Multiple-choiceQuestion Generation | —Unverified | 0 | 0 |
| AutoMCQ -- Automatically Generate Code Comprehension Questions using GenAI | May 22, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Auxiliary Class Based Multiple Choice Learning | Aug 6, 2021 | DiversityEnsemble Learning | —Unverified | 0 | 0 |
| The Earth is Flat? Unveiling Factual Errors in Large Language Models | Jan 1, 2024 | In-Context LearningMultiple-choice | —Unverified | 0 | 0 |