SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2901–2910 of 5548 papers

Title	Date	Tasks	Status	Hype
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning	Oct 6, 2023	BenchmarkingFederated Learning	—Unverified	0
CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis	Oct 6, 2023	BenchmarkingDomain Generalization	—Unverified	0
Bringing Quantum Algorithms to Automated Machine Learning: A Systematic Review of AutoML Frameworks Regarding Extensibility for QML Algorithms	Oct 6, 2023	AutoMLBenchmarking	—Unverified	0
A Review of Deep Reinforcement Learning in Serverless Computing: Function Scheduling and Resource Auto-Scaling	Oct 5, 2023	BenchmarkingDeep Reinforcement Learning	—Unverified	0
PepMLM: Target Sequence-Conditioned Generation of Therapeutic Peptide Binders via Span Masked Language Modeling	Oct 5, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
Benchmarking a foundation LLM on its ability to re-label structure names in accordance with the AAPM TG-263 report	Oct 5, 2023	Benchmarking	—Unverified	0
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation	Oct 5, 2023	BenchmarkingDecision Making	CodeCode Available	2
Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study	Oct 4, 2023	Autonomous VehiclesBenchmarking	—Unverified	0
Can Language Models Employ the Socratic Method? Experiments with Code Debugging	Oct 4, 2023	Benchmarking	CodeCode Available	1
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma	Oct 4, 2023	BenchmarkingSegmentation	CodeCode Available	0

Show:10 25 50

← PrevPage 291 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified