SOTAVerified

Multiple-choice

Papers

Showing 801825 of 1107 papers

TitleStatusHype
A Weak Supervision Approach for Predicting Difficulty of Technical Interview Questions0
Bayesian Statistical Modeling with Predictors from LLMs0
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets0
Benchmarking Bias in Large Language Models during Role-Playing0
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models0
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions0
The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations0
Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items0
Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change0
Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering0
Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare0
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization0
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models0
Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs0
Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing0
The impact of AI and peer feedback on research writing skills: a study using the CGScholar platform among Kazakhstani scholars0
LLMs May Perform MCQA by Selecting the Least Incorrect Option0
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions0
ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory0
A Novel Approach for Constrained Optimization in Graphical Models0
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles0
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own0
BLINK: Multimodal Large Language Models Can See but Not Perceive0
Show:102550
← PrevPage 33 of 45Next →

No leaderboard results yet.