SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2091–2100 of 5548 papers

Title	Date	Tasks	Status	Hype
Data-driven Power Flow Linearization: Simulation	Jun 10, 2024	BenchmarkingComputational Efficiency	—Unverified	0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture	Jun 10, 2024	BenchmarkingDecoder	CodeCode Available	0
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion Recognition	Jun 10, 2024	BenchmarkingEmotion Recognition	CodeCode Available	0
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents	Jun 10, 2024	Benchmarkingscientific discovery	CodeCode Available	3
Can Language Models Serve as Text-Based World Simulators?	Jun 10, 2024	BenchmarkingDecision Making	—Unverified	0
Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking	Jun 10, 2024	BenchmarkingEconometrics	—Unverified	0
TopoBench: A Framework for Benchmarking Topological Deep Learning	Jun 9, 2024	BenchmarkingDeep Learning	CodeCode Available	3
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking	Jun 9, 2024	BenchmarkingDrug Discovery	CodeCode Available	1
QGEval: Benchmarking Multi-dimensional Evaluation for Question Generation	Jun 9, 2024	BenchmarkingQuestion Generation	CodeCode Available	1
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data	Jun 9, 2024	BenchmarkingManagement	CodeCode Available	1

Show:10 25 50

← PrevPage 210 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified