SOTAVerified

Benchmarking

Papers

Showing 811820 of 5548 papers

TitleStatusHype
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical CasesCode0
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference0
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination0
ThrowBench: Benchmarking LLMs by Predicting Runtime ExceptionsCode0
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges0
Eventprop training for efficient neuromorphic applications0
Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge0
UnPuzzle: A Unified Framework for Pathology Image AnalysisCode1
GNNMerge: Merging of GNN Models Without Accessing Training DataCode0
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber AttacksCode0
Show:102550
← PrevPage 82 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified