SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4861–4870 of 5548 papers

Title	Date	Tasks	Status	Hype
ALDI++: Automatic and parameter-less discord and outlier detection for building energy load profiles	Mar 13, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available	0
Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming	Jul 17, 2019	Autonomous DrivingBenchmarking	CodeCode Available	0
Motley: Benchmarking Heterogeneity and Personalization in Federated Learning	Jun 18, 2022	BenchmarkingFairness	CodeCode Available	0
ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning	May 30, 2023	BenchmarkingIn-Context Learning	CodeCode Available	0
Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease Generalization	Jun 21, 2024	BenchmarkingSegmentation	CodeCode Available	0
The Role of Model Architecture and Scale in Predicting Molecular Properties: Insights from Fine-Tuning RoBERTa, BART, and LLaMA	May 2, 2024	BenchmarkingDrug Discovery	CodeCode Available	0
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs	May 27, 2025	BenchmarkingQuestion Selection	CodeCode Available	0
Benchmarking Representation Learning for Natural World Image Collections	Mar 30, 2021	BenchmarkingBinary Classification	CodeCode Available	0
Benchmarking Reinforcement Learning Algorithms on Real-World Robots	Sep 20, 2018	Benchmarkingcontinuous-control	CodeCode Available	0
Benchmarking Quantum Reinforcement Learning	Jan 27, 2025	Benchmarkingreinforcement-learning	CodeCode Available	0

Show:10 25 50

← PrevPage 487 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified