SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1461–1470 of 5548 papers

Title	Date	Tasks	Status	Hype
Advancing Histopathology with Deep Learning Under Data Scarcity: A Decade in Review	Oct 18, 2024	BenchmarkingDeep Learning	—Unverified	0
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs	Oct 18, 2024	BenchmarkingFairness	—Unverified	0
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments	Oct 18, 2024	Autonomous NavigationBenchmarking	CodeCode Available	1
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems	Oct 18, 2024	BenchmarkingQuestion Answering	CodeCode Available	1
Benchmarking Transcriptomics Foundation Models for Perturbation Analysis : one PCA still rules them all	Oct 17, 2024	AllBenchmarking	CodeCode Available	1
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models	Oct 17, 2024	Benchmarking	CodeCode Available	0
Sum Secrecy Rate Maximization for Full Duplex ISAC Systems	Oct 17, 2024	BenchmarkingIntegrated sensing and communication	—Unverified	0
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large p	Oct 17, 2024	Benchmarkingregression	CodeCode Available	0
debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias	Oct 17, 2024	BenchmarkingBias Detection	CodeCode Available	0
Trust but Verify: Programmatic VLM Evaluation in the Wild	Oct 17, 2024	BenchmarkingLanguage Modelling	—Unverified	0

Show:10 25 50

← PrevPage 147 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified