SOTAVerified

Multiple-choice

Papers

Showing 701750 of 1107 papers

TitleStatusHype
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language ModelsCode1
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla0
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based MethodsCode0
MMBench: Is Your Multi-modal Model an All-around Player?Code5
Analyzing Multiple-Choice Reading and Listening Comprehension Tests0
Structured Dialogue Discourse ParsingCode0
Chance-Constrained Multiple-Choice Knapsack Problem: Model, Algorithms, and ApplicationsCode0
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution0
Solving and Generating NPR Sunday Puzzles with Large Language ModelsCode0
RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care0
Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses0
Can ChatGPT pass the Vietnamese National High School Graduation Examination?0
Questioning the Survey Responses of Large Language ModelsCode0
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination0
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge EvaluationCode1
Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis0
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
Fine-Tuning Language Models with Just Forward PassesCode3
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question AnsweringCode0
ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of MindCode0
Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs0
This Land is Your, My Land: Evaluating Geopolitical Biases in Language ModelsCode0
Increasing Probability Mass on Answer Choices Does Not Always Improve AccuracyCode0
Make a Choice! Knowledge Base Question Answering with In-Context Learning0
Query Rewriting for Retrieval-Augmented Large Language Models0
NarrativeXL: A Large-scale Dataset For Long-Term Memory ModelsCode1
Iterative Forward Tuning Boosts In-Context Learning in Language ModelsCode0
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsCode1
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language ModelsCode1
A quantitative study of NLP approaches to question difficulty estimationCode0
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation ModelsCode3
EMBRACE: Evaluation and Modifications for Boosting RACECode0
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingCode1
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal LogicCode1
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research0
Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers0
Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies0
Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning0
DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach0
FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domainCode0
Bridging the Language Gap: Knowledge Injected Multilingual Question Answering0
GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam0
A Multiple Choices Reading Comprehension Corpus for Vietnamese Language EducationCode0
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission ExamsCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
Automatic Generation of Multiple-Choice Questions0
A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering0
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?0
Show:102550
← PrevPage 15 of 23Next →

No leaderboard results yet.