SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2301–2310 of 5548 papers

Title	Date	Tasks	Status	Hype
The iToBoS dataset: skin region images extracted from 3D total body photographs for lesion detection	Jan 30, 2025	BenchmarkingDiagnostic	CodeCode Available	0
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding	Jan 30, 2025	BenchmarkingDecision Making	—Unverified	0
Unraveling the Capabilities of Language Models in News Summarization	Jan 30, 2025	BenchmarkingFew-Shot Learning	CodeCode Available	0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research	Jan 29, 2025	Benchmarking	—Unverified	0
Benchmarking Quantum Convolutional Neural Networks for Signal Classification in Simulated Gamma-Ray Burst Detection	Jan 28, 2025	Benchmarking	—Unverified	0
Transfer of Knowledge through Reverse Annealing: A Preliminary Analysis of the Benefits and What to Share	Jan 27, 2025	BenchmarkingTransfer Learning	—Unverified	0
A Benchmarking Environment for Worker Flexibility in Flexible Job Shop Scheduling Problems	Jan 27, 2025	BenchmarkingEvolutionary Algorithms	—Unverified	0
Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation	Jan 27, 2025	BenchmarkingC++ code	—Unverified	0
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding	Jan 27, 2025	BenchmarkingCommon Sense Reasoning	—Unverified	0
Benchmarking Quantum Reinforcement Learning	Jan 27, 2025	Benchmarkingreinforcement-learning	CodeCode Available	0

Show:10 25 50

← PrevPage 231 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified