SOTAVerified

Multiple-choice

Papers

Showing 351375 of 1107 papers

TitleStatusHype
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation0
Boosting Healthcare LLMs Through Retrieved ContextCode1
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation0
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option ShufflingCode0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do0
Annealed Winner-Takes-All for Motion ForecastingCode1
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia0
Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement0
Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning ApproachCode0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesCode0
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models0
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal ModelsCode2
Training on the Benchmark Is Not All You NeedCode1
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?0
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning0
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice OptionsCode0
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein EngineeringCode1
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized ModelsCode0
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Show:102550
← PrevPage 15 of 45Next →

No leaderboard results yet.