SOTAVerified

Multiple-choice

Papers

Showing 211220 of 1107 papers

TitleStatusHype
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of MindCode1
Towards Multilingual LLM Evaluation for Baltic and Nordic languages: A study on Lithuanian History0
Rethinking AI Cultural Alignment0
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation0
ZNO-Eval: Benchmarking reasoning capabilities of large language models in UkrainianCode1
First Token Probability Guided RAG for Telecom Question Answering0
Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language UnderstandingCode0
Affordably Fine-tuned LLMs Provide Better Answers to Course-specific MCQsCode0
Knowledge Retrieval Based on Generative AI0
DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests0
Show:102550
← PrevPage 22 of 111Next →

No leaderboard results yet.