SOTAVerified

Multiple-choice

Papers

Showing 351375 of 1107 papers

TitleStatusHype
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models0
AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models0
Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and GenerationCode0
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
How much do LLMs learn from negative examples?Code0
LEAVS: An LLM-based Labeler for Abdominal CT SupervisionCode0
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory0
It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education0
SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAMCode0
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations0
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language ModelsCode0
Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces0
Towards Conversational AI for Disease Management0
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMsCode0
Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework0
Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of AbstractionCode0
Structured Outputs Enable General-Purpose LLMs to be Medical Experts0
The impact of AI and peer feedback on research writing skills: a study using the CGScholar platform among Kazakhstani scholars0
None of the Above, Less of the Right: Parallel Patterns between Humans and LLMs on Multi-Choice Questions Answering0
When an LLM is apprehensive about its answers -- and when its uncertainty is justifiedCode0
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts0
Show:102550
← PrevPage 15 of 45Next →

No leaderboard results yet.