SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1781–1790 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking data encoding methods in Quantum Machine Learning	May 20, 2025	BenchmarkingQuantum Machine Learning	—Unverified	0
DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis	May 20, 2025	BenchmarkingFairness	—Unverified	0
A Data-Driven Method to Identify IBRs with Dominant Participation in Sub-Synchronous Oscillations	May 20, 2025	Benchmarking	—Unverified	0
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations	May 20, 2025	Benchmarking	—Unverified	0
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use	May 20, 2025	Benchmarking	—Unverified	0
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI	May 20, 2025	Anomaly LocalizationBenchmarking	—Unverified	0
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation	May 20, 2025	BenchmarkingSentence	—Unverified	0
TransBench: Benchmarking Machine Translation for Industrial-Scale Applications	May 20, 2025	BenchmarkingMachine Translation	—Unverified	0
SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival Analysis	May 20, 2025	BenchmarkingModel Optimization	CodeCode Available	0
Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach	May 20, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 179 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified