| ExplanationLP: Abductive Reasoning for Explainable Science Question Answering | Oct 25, 2020 | Answer SelectionARC | —Unverified | 0 | 0 |
| Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization | Oct 13, 2021 | Multiple-choiceQuantization | —Unverified | 0 | 0 |
| Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph | Jun 3, 2024 | Knowledge GraphsMultiple-choice | —Unverified | 0 | 0 |
| Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement | Sep 10, 2024 | Multiple-choiceSentence | —Unverified | 0 | 0 |
| Exploring the Capabilities of Prompted Large Language Models in Educational and Assessment Applications | May 19, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine Knowledge | Mar 14, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| How Additional Knowledge can Improve Natural Language Commonsense Question Answering? | Sep 19, 2019 | ArticlesLanguage Modeling | —Unverified | 0 | 0 |
| Exposing the Limits of Video-Text Models through Contrast Sets | Jan 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History | Jan 15, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees | Nov 4, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Towards Multistage Design of Modular Systems | Jun 19, 2013 | Multiple-choice | —Unverified | 0 | 0 |
| FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning | Aug 29, 2019 | DiagnosticMultiple-choice | —Unverified | 0 | 0 |
| FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models | Apr 20, 2025 | DescriptiveEthics | —Unverified | 0 | 0 |
| Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction | Jan 28, 2025 | Logical ReasoningMultiple-choice | —Unverified | 0 | 0 |
| FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding | Mar 19, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models | Mar 15, 2024 | Few-Shot Image Classificationimage-classification | —Unverified | 0 | 0 |
| Field-testing items using artificial intelligence: Natural language processing with transformers | Oct 18, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework | Nov 16, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Fine-tuning BERT with Focus Words for Explanation Regeneration | Dec 1, 2020 | Explanation GenerationMultiple-choice | —Unverified | 0 | 0 |
| An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models | Sep 5, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| An Automated Multiple-Choice Question Generation Using Natural Language Processing Techniques | Mar 26, 2021 | Multiple-choiceQuestion Generation | —Unverified | 0 | 0 |
| First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge | Sep 20, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| First Token Probability Guided RAG for Telecom Question Answering | Jan 11, 2025 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering | May 25, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above | Feb 19, 2025 | AllMultiple-choice | —Unverified | 0 | 0 |