SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 411–420 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Counterfactual Image Generation	Mar 29, 2024	BenchmarkingConditional Image Generation	CodeCode Available	1	5
Benchmarking Constraint Inference in Inverse Reinforcement Learning	Jun 20, 2022	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials	Nov 6, 2021	BenchmarkingNeural Network simulation	CodeCode Available	1	5
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling	Jan 21, 2024	Benchmarking	CodeCode Available	1	5
Chaos as an interpretable benchmark for forecasting and data-driven modelling	Oct 11, 2021	BenchmarkingSymbolic Regression	CodeCode Available	1	5
CharacterBench: Benchmarking Character Customization of Large Language Models	Dec 16, 2024	Benchmarking	CodeCode Available	1	5
Benchmarking Commonsense Knowledge Base Population with an Effective Evaluation Dataset	Sep 16, 2021	BenchmarkingKnowledge Base Population	CodeCode Available	1	5
Benchmarking Data Science Agents	Feb 27, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1	5
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations	Mar 21, 2024	BenchmarkingMemorization	CodeCode Available	1	5

Show:10 25 50

← PrevPage 42 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified