SOTAVerified

Multiple-choice

Papers

Showing 351400 of 1107 papers

TitleStatusHype
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ0
RISCORE: Enhancing In-Context Riddle Solving in Language Models through Context-Reconstructed Example Augmentation0
Boosting Healthcare LLMs Through Retrieved ContextCode1
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation0
Evaluating the Performance and Robustness of LLMs in Materials Science Q&A and Property Predictions0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option ShufflingCode0
First Place Solution to the Multiple-choice Video QA Track of The Second Perception Test Challenge0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights0
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsCode0
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do0
Annealed Winner-Takes-All for Motion ForecastingCode1
Cracking the Code: Multi-domain LLM Evaluation on Real-World Professional Exams in Indonesia0
Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement0
Towards Democratizing Multilingual Large Language Models For Medicine Through A Two-Stage Instruction Fine-tuning ApproachCode0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesCode0
MaterialBENCH: Evaluating College-Level Materials Science Problem-Solving Abilities of Large Language Models0
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal ModelsCode2
Training on the Benchmark Is Not All You NeedCode1
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?0
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning0
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice OptionsCode0
TourSynbio: A Multi-Modal Large Model and Agent Framework to Bridge Text and Protein Sequences for Protein EngineeringCode1
Vision-Language and Large Language Model Performance in Gastroenterology: GPT, Claude, Llama, Phi, Mistral, Gemma, and Quantized ModelsCode0
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Towards Evaluating and Building Versatile Large Language Models for MedicineCode2
Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations0
Differentiating Choices via Commonality for Multiple-Choice Question AnsweringCode0
How Susceptible are LLMs to Influence in Prompts?0
Measuring Agreeableness Bias in Multimodal ModelsCode0
Chain-of-Exemplar: Enhancing Distractor Generation for Multimodal Educational Question GenerationCode0
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMsCode1
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil0
LLaVA-OneVision: Easy Visual Task TransferCode0
Winning Amazon KDD Cup'240
XMainframe: A Large Language Model for Mainframe ModernizationCode2
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language ModelsCode2
Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets0
MiniCPM-V: A GPT-4V Level MLLM on Your PhoneCode12
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language ModelsCode3
Improved Few-Shot Image Classification Through Multiple-Choice Questions0
Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models0
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive DiversityCode2
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealingCode1
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language UnderstandingCode2
Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions0
MIBench: Evaluating Multimodal Large Language Models over Multiple Images0
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual AlignmentCode0
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data0
Evaluating language models as risk scoresCode1
Show:102550
← PrevPage 8 of 23Next →

No leaderboard results yet.