SOTAVerified

Multiple-choice

Papers

Showing 121130 of 1107 papers

TitleStatusHype
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealingCode1
Evaluating language models as risk scoresCode1
TurkishMMLU: Measuring Massive Multitask Language Understanding in TurkishCode1
Fine-tuning Multimodal Large Language Models for Product BundlingCode1
Uncertainty is Fragile: Manipulating Uncertainty in Large Language ModelsCode1
ORAN-Bench-13K: An Open Source Benchmark for Assessing LLMs in Open Radio Access NetworksCode1
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual ContextsCode1
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient EvaluationCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Show:102550
← PrevPage 13 of 111Next →

No leaderboard results yet.