SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2411–2420 of 5548 papers

Title	Date	Tasks	Status	Hype
Alibaba’s Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT	Nov 1, 2020	Automatic Post-EditingBenchmarking	—Unverified	0
Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance	Mar 23, 2023	BenchmarkingData Augmentation	—Unverified	0
Benchmarking the rationality of AI decision making using the transitivity axiom	Feb 14, 2025	BenchmarkingDecision Making	—Unverified	0
Automated 3D Tumor Segmentation using Temporal Cubic PatchGAN (TCuP-GAN)	Nov 23, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified	0
Functional Code Building Genetic Programming	Jun 9, 2022	BenchmarkingProgram Synthesis	—Unverified	0
Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection	Apr 11, 2023	Adversarial AttackAdversarial Robustness	—Unverified	0
AutoLay: Benchmarking amodal layout estimation for autonomous driving	Aug 20, 2021	Amodal Layout EstimationAutonomous Driving	—Unverified	0
Benchmarking the Neural Linear Model for Regression	Dec 18, 2019	Bayesian OptimizationBenchmarking	—Unverified	0
Algorithm Selection with Probing Trajectories: Benchmarking the Choice of Classifier Model	Jan 20, 2025	Benchmarking	—Unverified	0
Efficient Pauli channel estimation with logarithmic quantum memory	Sep 25, 2023	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 242 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified