SOTAVerified|Agents Browse Leaderboard About Blog

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2211–2220 of 5548 papers

Title	Date	Tasks	Status	Hype
Agent-oriented Joint Decision Support for Data Owners in Auction-based Federated Learning	May 9, 2024	BenchmarkingFederated Learning	—Unverified	0
Aequitas Flow: Streamlining Fair ML Experimentation	May 9, 2024	BenchmarkingFairness	CodeCode Available	4
OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs	May 9, 2024	BenchmarkingFact Checking	CodeCode Available	2
Benchmarking Educational Program Repair	May 8, 2024	BenchmarkingProgram Repair	CodeCode Available	0
Bridging the Bosphorus: Advancing Turkish Large Language Models through Strategies for Low-Resource Language Adaptation and Benchmarking	May 7, 2024	BenchmarkingModel Selection	—Unverified	0
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets	May 7, 2024	BenchmarkingCancer Classification	CodeCode Available	1
Refining Joint Text and Source Code Embeddings for Retrieval Task with Parameter-Efficient Fine-Tuning	May 7, 2024	BenchmarkingContrastive Learning	CodeCode Available	0
ACEGEN: Reinforcement learning of generative chemical agents for drug discovery	May 7, 2024	BenchmarkingDecision Making	CodeCode Available	3
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images	May 6, 2024	Benchmarking	—Unverified	0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models	May 5, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 222 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified