SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3731–3740 of 5548 papers

Title	Date	Tasks	Status	Hype
Prompting Scientific Names for Zero-Shot Species Recognition	Oct 15, 2023	BenchmarkingZero-Shot Learning	—Unverified	0
Prompt Sketching for Large Language Models	Nov 8, 2023	Arithmetic ReasoningBenchmarking	—Unverified	0
Proof of Humanity: A Multi-Layer Network Framework for Certifying Human-Originated Content in an AI-Dominated Internet	Apr 2, 2025	Benchmarking	—Unverified	0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Sep 25, 2024	BenchmarkingFormal Logic	—Unverified	0
ProtIR: Iterative Refinement between Retrievers and Predictors for Protein Function Annotation	Feb 10, 2024	BenchmarkingLanguage Modeling	—Unverified	0
Protocol for Executing and Benchmarking Eight Computational Doublet-Detection Methods in Single-Cell RNA Sequencing Data Analysis	Jan 21, 2021	Benchmarking	—Unverified	0
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking	May 13, 2022	Benchmarkingreinforcement-learning	—Unverified	0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding	Nov 7, 2024	BenchmarkingMultiple-choice	—Unverified	0
PsychBench: A comprehensive and professional benchmark for evaluating the performance of LLM-assisted psychiatric clinical practice	Feb 28, 2025	BenchmarkingDiagnostic	—Unverified	0
PSYCHE: A Multi-faceted Patient Simulation Framework for Evaluation of Psychiatric Assessment Conversational Agents	Jan 3, 2025	Benchmarking	—Unverified	0

Show:10 25 50

← PrevPage 374 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified