SOTAVerified

Benchmarking

Papers

Showing 12711280 of 5548 papers

TitleStatusHype
Contemporary Symbolic Regression Methods and their Relative PerformanceCode1
Collective Knowledge: organizing research projects as a database of reusable components and portable workflows with common APIsCode1
New Protocols and Negative Results for Textual Entailment Data CollectionCode1
CombiBench: Benchmarking LLM Capability for Combinatorial MathematicsCode1
A framework for benchmarking clustering algorithmsCode1
Hopfield-Enhanced Deep Neural Networks for Artifact-Resilient Brain State DecodingCode1
Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge GraphCode1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative AgentsCode1
Combinatorial Optimization with Policy Adaptation using Latent Space SearchCode1
CodeS: Natural Language to Code Repository via Multi-Layer SketchCode1
Show:102550
← PrevPage 128 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified