SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3721–3730 of 5548 papers

Title	Date	Tasks	Status	Hype
Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning	May 31, 2021	BenchmarkingDeep Learning	—Unverified	0
Procedural Generalization by Planning with Self-Supervised World Models	Nov 2, 2021	BenchmarkingMeta-Learning	—Unverified	0
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions	Jul 1, 2024	BenchmarkingQuestion Generation	—Unverified	0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning	Oct 6, 2023	BenchmarkingFederated Learning	—Unverified	0
Progressive Class-level Distillation	May 30, 2025	BenchmarkingKnowledge Distillation	—Unverified	0
Progressive Multi-view Human Mesh Recovery with Self-Supervision	Dec 10, 2022	BenchmarkingDiversity	—Unverified	0
Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure	Sep 21, 2022	BenchmarkingImage Inpainting	—Unverified	0
Projective simulation applied to the grid-world and the mountain-car problem	May 21, 2014	Benchmarkingreinforcement-learning	—Unverified	0
Project MPG: towards a generalized performance benchmark for LLM capabilities	Oct 28, 2024	BenchmarkingChatbot	—Unverified	0
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study	Jan 25, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 373 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified