SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 811–820 of 5548 papers

Title	Date	Tasks	Status	Hype
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases	Mar 6, 2025	BenchmarkingDiagnostic	CodeCode Available	0
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference	Mar 6, 2025	Benchmarking	—Unverified	0
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination	Mar 6, 2025	Benchmarking	—Unverified	0
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions	Mar 6, 2025	BenchmarkingHumanEval	CodeCode Available	0
Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges	Mar 6, 2025	BenchmarkingLanguage Modeling	—Unverified	0
Eventprop training for efficient neuromorphic applications	Mar 6, 2025	BenchmarkingGPU	—Unverified	0
Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge	Mar 5, 2025	BenchmarkingImage Reconstruction	—Unverified	0
UnPuzzle: A Unified Framework for Pathology Image Analysis	Mar 5, 2025	BenchmarkingDiagnostic	CodeCode Available	1
GNNMerge: Merging of GNN Models Without Accessing Training Data	Mar 5, 2025	BenchmarkingComputational Efficiency	CodeCode Available	0
AttackSeqBench: Benchmarking Large Language Models' Understanding of Sequential Patterns in Cyber Attacks	Mar 5, 2025	Benchmarkinggraph construction	CodeCode Available	0

Show:10 25 50

← PrevPage 82 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified