SOTAVerified

Multiple-choice

Papers

Showing 10811090 of 1107 papers

TitleStatusHype
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs0
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh0
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages0
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering0
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation0
Investigating Data Contamination in Modern Benchmarks for Large Language Models0
Self-Assessment Tests are Unreliable Measures of LLM Personality0
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination0
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting0
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts0
Show:102550
← PrevPage 109 of 111Next →

No leaderboard results yet.