SOTAVerified

Benchmarking

Papers

Showing 37213730 of 5548 papers

TitleStatusHype
Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning0
Procedural Generalization by Planning with Self-Supervised World Models0
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions0
Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning0
Progressive Class-level Distillation0
Progressive Multi-view Human Mesh Recovery with Self-Supervision0
Progressive with Purpose: Guiding Progressive Inpainting DNNs through Context and Structure0
Projective simulation applied to the grid-world and the mountain-car problem0
Project MPG: towards a generalized performance benchmark for LLM capabilities0
Prompting ChatGPT for Chinese Learning as L2: A CEFR and EBCL Level Study0
Show:102550
← PrevPage 373 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified