SOTAVerified

Benchmarking

Papers

Showing 221230 of 5548 papers

TitleStatusHype
Advances in APPFL: A Comprehensive and Extensible Federated Learning FrameworkCode2
Assessing SPARQL capabilities of Large Language ModelsCode2
PlantSeg: A Large-Scale In-the-wild Dataset for Plant Disease SegmentationCode2
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM InteractionsCode2
PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation AnalysisCode2
SustainDC: Benchmarking for Sustainable Data Center ControlCode2
MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement LearningCode2
COALA: A Practical and Vision-Centric Federated Learning PlatformCode2
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion ModelsCode2
GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure DetectionCode2
Show:102550
← PrevPage 23 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified