SOTAVerified|Agents Browse Leaderboard About

MMLU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 340 papers

Title	Date	Tasks	Status	Hype
On the Reasoning Capacity of AI Models and How to Quantify It	Jan 23, 2025	MemorizationMMLU	—Unverified	0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs	Jan 21, 2025	GSM8KIn-Context Learning	—Unverified	0
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy	Jan 20, 2025	MMLU	CodeCode Available	0
DNA 1.0 Technical Report	Jan 18, 2025	BelebeleGSM8K	—Unverified	0
Inference-Time-Compute: More Faithful? A Research Note	Jan 14, 2025	AttributeMMLU	—Unverified	0
CHAIR -- Classifier of Hallucination as Improver	Jan 5, 2025	HallucinationMMLU	CodeCode Available	0
Unraveling Indirect In-Context Learning Using Influence Functions	Jan 1, 2025	In-Context LearningInformativeness	—Unverified	0
Monty Hall and Optimized Conformal Prediction to Improve Decision-Making with LLMs	Dec 31, 2024	Conformal PredictionDecision Making	—Unverified	0
Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation	Dec 31, 2024	Language Model EvaluationLanguage Modeling	—Unverified	0
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity	Dec 30, 2024	BenchmarkingCode Generation	—Unverified	0

Show:10 25 50

← PrevPage 22 of 34Next →

All datasets SIOP 2020/2021 MMLU-Pro VCTK

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	go ahead, make my data	Final_score	61.72	—	Unverified
2	#GreedyCow	Final_score	61.63	—	Unverified
3	Don't Ask Us y	Final_score	61.4	—	Unverified
4	Data_and_Confused	Final_score	60.96	—	Unverified
5	Waffles	Final_score	60.91	—	Unverified
6	raaka	Final_score	60.91	—	Unverified
7	Team Procrustination	Final_score	60.64	—	Unverified
8	Axiom Consulting Partners	Final_score	60.63	—	Unverified
9	Lets_Be_Fair	Final_score	60.23	—	Unverified
10	gooners	Final_score	60.22	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Orange-mini	0-shot MRR	99.19	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	HybridBeam+	SI-SDRi	13.3	—	Unverified