SOTAVerified

Benchmarking

Papers

Showing 39764000 of 5548 papers

TitleStatusHype
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents0
SoK: Systematization and Benchmarking of Deepfake Detectors in a Unified Framework0
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates0
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series0
Solver Scheduling via Answer Set Programming0
Solving the chemical master equation for monomolecular reaction systems analytically: a Doi-Peliti path integral view0
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research0
SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset0
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents0
SortBench: Benchmarking LLMs based on their ability to sort lists0
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge0
So you think you can track?0
SpaceTx: A Roadmap for Benchmarking Spatial Transcriptomics Exploration of the Brain0
Sparse Deep Nonnegative Matrix Factorization0
Sparse Representation-Based Classification: Orthogonal Least Squares or Orthogonal Matching Pursuit?0
Spatially Binned ROC: A Comprehensive Saliency Metric0
Spatially Correlated Patterns in Adversarial Images0
Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting0
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues0
SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration0
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads0
SpeechVerse: A Large-scale Generalizable Audio Language Model0
Speed Benchmarking of Genetic Programming Frameworks0
SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems0
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration0
Show:102550
← PrevPage 160 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified