SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 361–370 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking the Myopic Trap: Positional Bias in Information Retrieval	May 20, 2025	BenchmarkingInformation Retrieval	CodeCode Available	5
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI	May 20, 2025	Anomaly LocalizationBenchmarking	—Unverified	0
Benchmarking data encoding methods in Quantum Machine Learning	May 20, 2025	BenchmarkingQuantum Machine Learning	—Unverified	0
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation	May 20, 2025	BenchmarkingSentence	—Unverified	0
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas	May 20, 2025	BenchmarkingLogical Reasoning	—Unverified	0
NavBench: A Unified Robotics Benchmark for Reinforcement Learning-Based Autonomous Navigation	May 20, 2025	Autonomous NavigationBenchmarking	—Unverified	0
SzCORE as a benchmark: report from the seizure detection challenge at the 2025 AI in Epilepsy and Neurological Disorders Conference	May 19, 2025	BenchmarkingEEG	—Unverified	0
HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity	May 19, 2025	Benchmarkingfeature selection	CodeCode Available	0
Benchmarking MOEAs for solving continuous multi-objective RL problems	May 19, 2025	BenchmarkingEvolutionary Algorithms	CodeCode Available	0
Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference	May 19, 2025	BenchmarkingCausal Inference	—Unverified	0

Show:10 25 50

← PrevPage 37 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified