SOTAVerified

Benchmarking

Papers

Showing 16711680 of 5548 papers

TitleStatusHype
PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image DatasetCode0
LLM Performance for Code Generation on Noisy TasksCode0
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation0
Joint Phase Shift Optimization and Precoder Selection for RIS-Assisted 5G NR MIMO Systems0
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns0
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking0
MSQA: Benchmarking LLMs on Graduate-Level Materials Science Reasoning and Knowledge0
SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking ServicesCode0
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMsCode0
Jailbreak Distillation: Renewable Safety Benchmarking0
Show:102550
← PrevPage 168 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified