SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2741–2750 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Domain Generalization Algorithms in Computational Pathology	Sep 25, 2024	BenchmarkingData Augmentation	CodeCode Available	0
Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices	Sep 25, 2024	Autonomous VehiclesBenchmarking	—Unverified	0
Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning	Sep 25, 2024	BenchmarkingFormal Logic	—Unverified	0
Omnibenchmark (alpha) for continuous and open benchmarking in bioinformatics	Sep 25, 2024	Benchmarking	—Unverified	0
SEN12-WATER: A New Dataset for Hydrological Applications and its Benchmarking	Sep 25, 2024	BenchmarkingManagement	—Unverified	0
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework	Sep 24, 2024	Benchmarkingcounterfactual	CodeCode Available	0
HLB: Benchmarking LLMs' Humanlikeness in Language Use	Sep 24, 2024	Benchmarking	—Unverified	0
Benchmarking Robustness of Endoscopic Depth Estimation with Synthetically Corrupted Data	Sep 24, 2024	BenchmarkingDepth Estimation	CodeCode Available	0
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling	Sep 24, 2024	ArticlesBenchmarking	—Unverified	0
Ducho meets Elliot: Large-scale Benchmarks for Multimodal Recommendation	Sep 24, 2024	BenchmarkingMovie Recommendation	CodeCode Available	0

Show:10 25 50

← PrevPage 275 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified