SOTAVerified

Benchmarking

Papers

Showing 30113020 of 5548 papers

TitleStatusHype
Beyond Optimism: Exploration With Partially Observable RewardsCode0
FairX: A comprehensive benchmarking tool for model analysis using fairness, utility, and explainabilityCode0
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM PipelinesCode0
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions0
DASB -- Discrete Audio and Speech Benchmark0
Selected Languages are All You Need for Cross-lingual Truthfulness TransferCode0
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary0
Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data0
Resource-efficient Medical Image Analysis with Self-adapting Forward-Forward Networks0
QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse MoleculesCode0
Show:102550
← PrevPage 302 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified