SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1701–1710 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking and Enhancing LLM Agents in Localizing Linux Kernel Bugs	May 26, 2025	BenchmarkingFault localization	CodeCode Available	0
PathBench: A comprehensive comparison benchmark for pathology foundation models towards precision oncology	May 26, 2025	BenchmarkingPrognosis	—Unverified	0
Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement	May 26, 2025	Benchmarking	CodeCode Available	0
Transformers in Protein: A Survey	May 26, 2025	BenchmarkingDrug Discovery	—Unverified	0
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs	May 26, 2025	Benchmarking	—Unverified	0
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare	May 26, 2025	BenchmarkingMedical Diagnosis	CodeCode Available	0
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages	May 26, 2025	BenchmarkingTransliteration	—Unverified	0
Automated Text-to-Table for Reasoning-Intensive Table QA: Pipeline Design and Benchmarking Insights	May 26, 2025	BenchmarkingQuestion Answering	CodeCode Available	0
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking	May 26, 2025	BenchmarkingOptical Flow Estimation	—Unverified	0
FinLoRA: Benchmarking LoRA Methods for Fine-Tuning LLMs on Financial Datasets	May 26, 2025	BenchmarkingGPU	—Unverified	0

Show:10 25 50

← PrevPage 171 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified