SOTAVerified

Multiple-choice

Papers

Showing 701725 of 1107 papers

TitleStatusHype
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language ModelsCode1
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla0
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based MethodsCode0
MMBench: Is Your Multi-modal Model an All-around Player?Code5
Analyzing Multiple-Choice Reading and Listening Comprehension Tests0
Structured Dialogue Discourse ParsingCode0
Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and ApplicationsCode0
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution0
Solving and Generating NPR Sunday Puzzles with Large Language ModelsCode0
RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care0
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses0
Can ChatGPT pass the Vietnamese National High School Graduation Examination?0
Questioning the Survey Responses of Large Language ModelsCode0
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination0
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge EvaluationCode1
Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis0
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
Fine-Tuning Language Models with Just Forward PassesCode3
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question AnsweringCode0
ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of MindCode0
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs0
This Land is Your, My Land: Evaluating Geopolitical Biases in Language ModelsCode0
Increasing Probability Mass on Answer Choices Does Not Always Improve AccuracyCode0
Show:102550
← PrevPage 29 of 45Next →

No leaderboard results yet.