SOTAVerified

Multiple-choice

Papers

Showing 826850 of 1107 papers

TitleStatusHype
An MRC Framework for Semantic Role Labeling0
BloomVQA: Assessing Hierarchical Multi-modal Comprehension0
The Order Effect: Investigating Prompt Sensitivity to Input Order in LLMs0
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs0
The Use of Artificial Intelligence Tools in Assessing Content Validity: A Comparative Study with Human Experts0
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension0
Bridging the Language Gap: Knowledge Injected Multilingual Question Answering0
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution0
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams0
Can ChatGPT pass the Vietnamese National High School Graduation Examination?0
Can Crowdsourcing be used for Effective Annotation of Arabic?0
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?0
The use of large language models to enhance cancer clinical trial educational materials0
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!0
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer0
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy0
ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models0
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding0
Changing Answer Order Can Decrease MMLU Accuracy0
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks0
What Makes Reading Comprehension Questions Difficult? Investigating Variation in Passage Sources and Question Types0
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data0
An Improved Traditional Chinese Evaluation Suite for Foundation Model0
Show:102550
← PrevPage 34 of 45Next →

No leaderboard results yet.