SOTAVerified

Multiple-choice

Papers

Showing 551575 of 1107 papers

TitleStatusHype
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation0
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option ShufflingCode0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do0
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia0
Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement0
Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning ApproachCode0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesCode0
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models0
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?0
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning0
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice OptionsCode0
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized ModelsCode0
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations0
Differentiating Choices via Commonality for Multiple-Choice Question AnsweringCode0
How Susceptible are LLMs to Influence in Prompts?0
Measuring Agreeableness Bias in Multimodal ModelsCode0
Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question GenerationCode0
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil0
LLaVA-OneVision: Easy Visual Task TransferCode0
Winning Amazon KDD Cup'240
Show:102550
← PrevPage 23 of 45Next →

No leaderboard results yet.