SOTAVerified|Agents Browse Leaderboard About

Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 221–230 of 5548 papers

Title	Date	Tasks	Status	Hype
Advances in APPFL: A Comprehensive and Extensible Federated Learning Framework	Sep 17, 2024	BenchmarkingFederated Learning	CodeCode Available	2
Assessing SPARQL capabilities of Large Language Models	Sep 9, 2024	BenchmarkingKnowledge Graphs	CodeCode Available	2
PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease Segmentation	Sep 6, 2024	Benchmarkingimage-classification	CodeCode Available	2
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions	Aug 28, 2024	Benchmarking	CodeCode Available	2
PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis	Aug 20, 2024	Benchmarking	CodeCode Available	2
SustainDC: Benchmarking for Sustainable Data Center Control	Aug 14, 2024	BenchmarkingManagement	CodeCode Available	2
MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning	Jul 23, 2024	BenchmarkingDecision Making	CodeCode Available	2
COALA: A Practical and Vision-Centric Federated Learning Platform	Jul 23, 2024	BenchmarkingContinual Learning	CodeCode Available	2
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models	Jul 17, 2024	BenchmarkingRed Teaming	CodeCode Available	2
GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection	Jul 16, 2024	BenchmarkingLoop Closure Detection	CodeCode Available	2

Show:10 25 50

← PrevPage 23 of 555Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified