SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3131–3140 of 5548 papers

Title	Date	Tasks	Status	Hype
A Synthetic Benchmarking Pipeline to Compare Camera Calibration Algorithms	Jul 3, 2023	BenchmarkingCamera Calibration	—Unverified	0
Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity	Jul 2, 2023	BenchmarkingData Integration	—Unverified	0
SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency	Jul 1, 2023	BenchmarkingData Augmentation	—Unverified	0
InstructEval: Systematic Evaluation of Instruction Selection Methods	Jul 1, 2023	BenchmarkingIn-Context Learning	—Unverified	0
Learning Environment Models with Continuous Stochastic Dynamics	Jun 29, 2023	AcrobotBenchmarking	—Unverified	0
Benchmarking Large Language Model Capabilities for Conditional Generation	Jun 29, 2023	BenchmarkingFew-Shot Learning	—Unverified	0
Principles and Guidelines for Evaluating Social Robot Navigation Algorithms	Jun 29, 2023	BenchmarkingRobot Navigation	—Unverified	0
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors	Jun 29, 2023	Benchmarking	—Unverified	0
Uncovering the Limits of Machine Learning for Automatic Vulnerability Detection	Jun 28, 2023	BenchmarkingData Augmentation	CodeCode Available	1
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity	Jun 28, 2023	BenchmarkingImage Captioning	—Unverified	0

Show:10 25 50

← PrevPage 314 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified