SOTAVerified

Benchmarking

Papers

Showing 39813990 of 5548 papers

TitleStatusHype
Solving the chemical master equation for monomolecular reaction systems analytically: a Doi-Peliti path integral view0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research0
SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents0
SortBench: Benchmarking LLMs based on their ability to sort lists0
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge0
So you think you can track?0
SpaceTx: A Roadmap for Benchmarking Spatial Transcriptomics Exploration of the Brain0
Sparse Deep Nonnegative Matrix Factorization0
Sparse Representation-Based Classification: Orthogonal Least Squares or Orthogonal Matching Pursuit?0
Show:102550
← PrevPage 399 of 555Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified