SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3831–3840 of 5548 papers

Title	Date	Tasks	Status	Hype
Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations	Jun 9, 2022	Benchmarkingcontinuous-control	CodeCode Available	2
SwinCheX: Multi-label classification on chest X-ray images with transformers	Jun 9, 2022	BenchmarkingMulti-Label Classification	CodeCode Available	1
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified	0
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional Benchmark	Jun 8, 2022	BenchmarkingExplainable Artificial Intelligence (XAI)	CodeCode Available	1
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks	Jun 8, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified	0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization	Jun 8, 2022	BenchmarkingFederated Learning	—Unverified	0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance	Jun 7, 2022	BenchmarkingOpen-Ended Question Answering	—Unverified	0
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored Clustering	Jun 6, 2022	BenchmarkingClustering	CodeCode Available	1
MorisienMT: A Dataset for Mauritian Creole Machine Translation	Jun 6, 2022	BenchmarkingMachine Translation	—Unverified	0
Which models are innately best at uncertainty estimation?	Jun 5, 2022	BenchmarkingOut-of-Distribution Detection	—Unverified	0

Show:10 25 50

← PrevPage 384 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified