SOTAVerified|Agents Browse Leaderboard About Blog

Holdout Set

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 35 papers

Title	Date	Tasks	Status	Hype
Outcome-based Reinforcement Learning to Predict the Future	May 23, 2025	Holdout SetMath	—Unverified	0
The DCR Delusion: Measuring the Privacy Risk of Synthetic Data	May 2, 2025	Holdout Set	—Unverified	0
Parametric Scaling Law of Tuning Bias in Conformal Prediction	Feb 5, 2025	Conformal PredictionHoldout Set	CodeCode Available	0
Navigating Towards Fairness with Data Selection	Dec 15, 2024	FairnessHoldout Set	—Unverified	0
Who's the (Multi-)Fairest of Them All: Rethinking Interpolation-Based Data Augmentation Through the Lens of Multicalibration	Dec 13, 2024	AllData Augmentation	CodeCode Available	0
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts	Oct 11, 2024	Holdout SetMisconceptions	—Unverified	0
STAND: Data-Efficient and Self-Aware Precondition Induction for Interactive Task Learning	Sep 11, 2024	Active LearningHoldout Set	—Unverified	0
Machine Learning for Quantifier Selection in cvc5	Aug 26, 2024	Holdout Set	—Unverified	0
Comprehensive dataset of user-submitted articles with ideological and extreme bias from Reddit	Aug 12, 2024	ArticlesHoldout Set	CodeCode Available	0
Understanding Transformers via N-gram Statistics	Jun 30, 2024	Holdout Set	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 4Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	BloodAxe, 1st place xView3 prize challenge	Aggregate xView3 Score	0.62	—	Unverified
2	selim_sef, 2nd place xView3 prize challenge	Aggregate xView3 Score	0.6	—	Unverified
3	Tumen, 3rd place xView3 prize challenge	Aggregate xView3 Score	0.58	—	Unverified
4	Skylight at AI2, 4th place xView3 prize challenge	Aggregate xView3 Score	0.58	—	Unverified
5	Kohei, 5th place xView3 prize challenge	Aggregate xView3 Score	0.57	—	Unverified