SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3181–3190 of 5548 papers

Title	Date	Tasks	Status	Hype
Detecting critical treatment effect bias in small subgroups	Apr 29, 2024	BenchmarkingDecision Making	CodeCode Available	0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methods	Apr 29, 2024	BenchmarkingDrug Discovery	CodeCode Available	0
Efficient Exploration of Image Classifier Failures with Bayesian Optimization and Text-to-Image Models	Apr 26, 2024	AttributeBayesian Optimization	—Unverified	0
Stochastic Spiking Neural Networks with First-to-Spike Coding	Apr 26, 2024	Benchmarking	—Unverified	0
CriSp: Leveraging Tread Depth Maps for Enhanced Crime-Scene Shoeprint Matching	Apr 25, 2024	BenchmarkingData Augmentation	CodeCode Available	0
Benchmarking Mobile Device Control Agents across Diverse Configurations	Apr 25, 2024	BenchmarkingImitation Learning	—Unverified	0
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning	Apr 24, 2024	Benchmarkingreinforcement-learning	—Unverified	0
ApisTox: a new benchmark dataset for the classification of small molecules toxicity on honey bees	Apr 24, 2024	BenchmarkingMolecular Property Prediction	CodeCode Available	0
Empirical Analysis of the Dynamic Binary Value Problem with IOHprofiler	Apr 24, 2024	Benchmarking	—Unverified	0
Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Apr 23, 2024	BenchmarkingHyperspectral Image Classification	CodeCode Available	0

Show:10 25 50

← PrevPage 319 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified