SOTAVerified

Multiple-choice

Papers

Showing 601625 of 1107 papers

TitleStatusHype
Instruction Fine-Tuning: Does Prompt Loss Matter?0
A Study on Large Language Models' Limitations in Multiple-Choice Question AnsweringCode0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual Understanding0
Automated Answer Validation using Text Similarity0
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities0
A Novel Multi-Stage Prompting Approach for Language Agnostic MCQ Generation using GPTCode0
The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language ModelsCode1
A Joint-Reasoning based Disease Q&A System0
SEED-Bench: Benchmarking Multimodal Large Language ModelsCode3
The Earth is Flat? Unveiling Factual Errors in Large Language Models0
FusionMind -- Improving question and answering with external context fusion0
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer SecurityCode0
RoleEval: A Bilingual Role Evaluation Benchmark for Large Language ModelsCode1
HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs ResponsesCode1
Towards a Unified Multimodal Reasoning FrameworkCode0
Perception Test 2023: A Summary of the First Challenge And Outcome0
BloomVQA: Assessing Hierarchical Multi-modal Comprehension0
Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output DistributionsCode0
An In-depth Look at Gemini's Language AbilitiesCode1
Marathon: A Race Through the Realm of Long Context with Large Language ModelsCode1
Self-Evaluation Improves Selective Generation in Large Language Models0
A Foundational Multimodal Vision Language AI Assistant for Human Pathology0
Steering Llama 2 via Contrastive Activation AdditionCode2
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and LayersCode1
Show:102550
← PrevPage 25 of 45Next →

No leaderboard results yet.