SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3801–3810 of 5548 papers

Title	Date	Tasks	Status	Hype
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration	Jun 9, 2023	BenchmarkingTime Series	—Unverified	0
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework	Jun 8, 2023	Benchmarking	CodeCode Available	0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMs	Jun 8, 2023	BenchmarkingFederated Learning	CodeCode Available	0
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization Problems	Jun 8, 2023	BenchmarkingDescriptive	CodeCode Available	0
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems	Jun 8, 2023	BenchmarkingEdge-computing	—Unverified	0
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models	Jun 8, 2023	BenchmarkingFairness	CodeCode Available	0
RD-Suite: A Benchmark for Ranking Distillation	Jun 7, 2023	Benchmarking	—Unverified	0
Self-Adjusting Weighted Expected Improvement for Bayesian Optimization	Jun 7, 2023	Bayesian OptimizationBenchmarking	CodeCode Available	0
Benchmarking Foundation Models with Language-Model-as-an-Examiner	Jun 7, 2023	BenchmarkingLanguage Modeling	—Unverified	0
ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection	Jun 7, 2023	AttributeAutonomous Driving	—Unverified	0

Show:10 25 50

← PrevPage 381 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified