SOTAVerified

Multiple-choice

Papers

Showing 9761000 of 1107 papers

TitleStatusHype
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Investigating Prior Knowledge for Challenging Chinese Machine Reading ComprehensionCode0
Joint Learning of Sentence Embeddings for Relevance and EntailmentCode0
Enhancing textual textbook question answering with large language models and retrieval augmented generationCode0
Kaleidoscope: In-language Exams for Massively Multilingual Vision EvaluationCode0
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language ModelsCode0
AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context RetrievalCode0
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental HealthcareCode0
Uncertainty quantification in fine-tuned LLMs using LoRA ensemblesCode0
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended SettingsCode0
Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science ExamCode0
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language ModelsCode0
Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning ApproachCode0
Evaluating Large Language Model Biases in Persona-Steered GenerationCode0
SeqSAM: Autoregressive Multiple Hypothesis Prediction for Medical Image Segmentation using SAMCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question AnsweringCode0
Order-Independence Without Fine TuningCode0
Towards Diverse Perspective Learning with Selection over Multiple Temporal PoolingsCode0
PROST: Physical Reasoning of Objects through Space and TimeCode0
VEGAS: Towards Visually Explainable and Grounded Artificial Social IntelligenceCode0
Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot SettingCode0
This Land is Your, My Land: Evaluating Geopolitical Biases in Language ModelsCode0
Evaluating the Instruction-following Abilities of Language Models using Knowledge TasksCode0
Multi-class Hierarchical Question Classification for Multiple Choice Science ExamsCode0
Show:102550
← PrevPage 40 of 45Next →

No leaderboard results yet.