| Answering questions by learning to rank -- Learning to rank by answering questions | Sep 2, 2019 | ARCLearning-To-Rank | —Unverified | 0 | 0 |
| Answering questions by learning to rank - Learning to rank by answering questions | Nov 1, 2019 | ARCLearning-To-Rank | —Unverified | 0 | 0 |
| Answering Questions in Stages: Prompt Chaining for Contract QA | Oct 9, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Answering Science Exam Questions Using Query Rewriting with Background Knowledge | Sep 15, 2018 | ARCInformation Retrieval | —Unverified | 0 | 0 |
| Answering Science Exam Questions Using Query Reformulation with Background Knowledge | Nov 17, 2018 | ARCInformation Retrieval | —Unverified | 0 | 0 |
| Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension | Jan 16, 2022 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension | May 1, 2022 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension | Jan 16, 2020 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments | Nov 28, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models | Jun 18, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques | Nov 2, 2017 | BIG-bench Machine LearningGeneral Classification | —Unverified | 0 | 0 |
| AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects | Dec 31, 2024 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic | Mar 14, 2024 | EthicsMultiple-choice | —Unverified | 0 | 0 |
| A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options | Dec 14, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation | May 15, 2025 | InformativenessMultiple-choice | —Unverified | 0 | 0 |
| A review of faithfulness metrics for hallucination assessment in Large Language Models | Dec 31, 2024 | BenchmarkingHallucination | —Unverified | 0 | 0 |
| Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation | Dec 16, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| ARGUS: Hallucination and Omission Evaluation in Video-LLMs | Jun 9, 2025 | DescriptiveForm | —Unverified | 0 | 0 |
| ActionAtlas: A VideoQA Benchmark for Domain-specialized Action Recognition | Oct 8, 2024 | Action RecognitionMultiple-choice | —Unverified | 0 | 0 |
| Aryl: An Elastic Cluster Scheduler for Deep Learning | Feb 16, 2022 | Deep LearningGPU | —Unverified | 0 | 0 |
| A Semantic Feature-Wise Transformation Relation Network for Automatic Short Answer Grading | Nov 1, 2021 | automatic short answer gradingData Augmentation | —Unverified | 0 | 0 |
| A Semantic Parsing Algorithm to Solve Linear Ordering Problems | Feb 12, 2025 | Multiple-choiceSemantic Parsing | —Unverified | 0 | 0 |
| A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs | Jun 11, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment | Apr 19, 2025 | ClassificationMultiple-choice | —Unverified | 0 | 0 |
| Assessing Distractors in Multiple-Choice Tests | Nov 8, 2023 | DiversityMultiple-choice | —Unverified | 0 | 0 |