SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2811–2820 of 5548 papers

Title	Date	Tasks	Status	Hype
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain	Oct 31, 2023	BenchmarkingDiagnostic	—Unverified	0
Next-generation MRD assays: do we have the tools to evaluate them properly?	Oct 31, 2023	BenchmarkingSensitivity	—Unverified	0
In Search of Lost Online Test-time Adaptation: A Survey	Oct 31, 2023	BenchmarkingGPU	CodeCode Available	1
What's In My Big Data?	Oct 31, 2023	Benchmarking	CodeCode Available	2
Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests	Oct 31, 2023	Benchmarking	—Unverified	0
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks	Oct 30, 2023	Benchmarkingobject-detection	CodeCode Available	2
Domain Generalization in Computational Pathology: Survey and Guidelines	Oct 30, 2023	BenchmarkingDiagnostic	—Unverified	0
A Metadata-Driven Approach to Understand Graph Neural Networks	Oct 30, 2023	BenchmarkingGraph Learning	—Unverified	0
Re-evaluating Retrosynthesis Algorithms with Syntheseus	Oct 30, 2023	BenchmarkingMulti-step retrosynthesis	CodeCode Available	1
LLMs and Finetuning: Benchmarking cross-domain performance for hate speech detection	Oct 29, 2023	BenchmarkingDiversity	—Unverified	0

Show:10 25 50

← PrevPage 282 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified