SOTAVerified

Benchmarking

Papers

Showing 13011310 of 5548 papers

TitleStatusHype
CompanyKG: A Large-Scale Heterogeneous Graph for Company Similarity QuantificationCode1
Contemporary Symbolic Regression Methods and their Relative PerformanceCode1
COVID-19 event extraction from Twitter via extractive question answering with continuous promptsCode1
Benchmarking Robustness of Machine Reading Comprehension ModelsCode1
Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet DatasetsCode1
Benchmarking Robustness to Adversarial Image ObfuscationsCode1
Benchmarks for Deep Off-Policy EvaluationCode1
CoDEx: A Comprehensive Knowledge Graph Completion BenchmarkCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
CodeS: Natural Language to Code Repository via Multi-Layer SketchCode1
Show:102550
← PrevPage 131 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified