SOTAVerified

Benchmarking

Papers

Showing 651660 of 5548 papers

TitleStatusHype
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class TaxonomiesCode1
POGEMA: A Benchmark Platform for Cooperative Multi-Agent PathfindingCode1
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and EvaluationsCode1
Restore Anything Model via Efficient Degradation AdaptationCode1
SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse ModalitiesCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
Separable Operator NetworksCode1
When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph BenchmarkCode1
CIBench: Evaluating Your LLMs with a Code Interpreter PluginCode1
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization ModelingCode1
Show:102550
← PrevPage 66 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified