SOTAVerified

Benchmarking

Papers

Showing 19811990 of 5548 papers

TitleStatusHype
Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease GeneralizationCode0
GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data AnalysisCode2
Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors0
Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video0
NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and BenchmarkingCode7
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM PipelinesCode0
Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary0
QeMFi: A Multifidelity Dataset of Quantum Chemical Properties of Diverse MoleculesCode0
Selected Languages are All You Need for Cross-lingual Truthfulness TransferCode0
Beyond Optimism: Exploration With Partially Observable RewardsCode0
Show:102550
← PrevPage 199 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified