SOTAVerified

Benchmarking

Papers

Showing 17011710 of 5548 papers

TitleStatusHype
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel BugsCode0
PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology0
Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative RefinementCode0
Transformers in Protein: A Survey0
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs0
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and HealthcareCode0
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages0
Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking InsightsCode0
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking0
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets0
Show:102550
← PrevPage 171 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified