Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3976–4000 of 5548 papers

Title	Date	Tasks	Status
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents	May 8, 2025	Benchmarking	—Unverified
SoK: Systematization and Benchmarking of Deepfake Detectors in a Unified Framework	Jan 9, 2024	BenchmarkingDeepFake Detection	—Unverified
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates	Nov 1, 2022	Benchmarking	—Unverified
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series	Feb 28, 2025	BenchmarkingSolar Irradiance Forecasting	—Unverified
Solver Scheduling via Answer Set Programming	Jan 6, 2014	BenchmarkingScheduling	—Unverified
Solving the chemical master equation for monomolecular reaction systems analytically: a Doi-Peliti path integral view	Nov 3, 2019	Benchmarking	—Unverified
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research	Jan 29, 2025	Benchmarking	—Unverified
SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset	Aug 4, 2022	BenchmarkingMulti-Object Tracking	—Unverified
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents	Jun 9, 2025	BenchmarkingSynthetic Data Generation	—Unverified
SortBench: Benchmarking LLMs based on their ability to sort lists	Apr 11, 2025	Benchmarking	—Unverified
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge	May 27, 2025	BenchmarkingMultiple-choice	—Unverified
So you think you can track?	Sep 13, 2023	BenchmarkingObject	—Unverified
SpaceTx: A Roadmap for Benchmarking Spatial Transcriptomics Exploration of the Brain	Jan 20, 2023	BenchmarkingCell Segmentation	—Unverified
Sparse Deep Nonnegative Matrix Factorization	Jul 28, 2017	BenchmarkingDimensionality Reduction	—Unverified
Sparse Representation-Based Classification: Orthogonal Least Squares or Orthogonal Matching Pursuit?	Jul 18, 2016	BenchmarkingClassification	—Unverified
Spatially Binned ROC: A Comprehensive Saliency Metric	Jun 1, 2016	Benchmarking	—Unverified
Spatially Correlated Patterns in Adversarial Images	Nov 21, 2020	BenchmarkingBlocking	—Unverified
Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting	Feb 25, 2022	BenchmarkingGraph Neural Network	—Unverified
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues	Apr 21, 2025	BenchmarkingSpeaker Identification	—Unverified
SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration	Dec 14, 2023	BenchmarkingPoint Cloud Registration	—Unverified
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads	Aug 28, 2023	BenchmarkingSelf-Supervised Learning	—Unverified
SpeechVerse: A Large-scale Generalizable Audio Language Model	May 14, 2024	Automatic Speech RecognitionBenchmarking	—Unverified
Speed Benchmarking of Genetic Programming Frameworks	May 25, 2021	Benchmarking	—Unverified
SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems	Jul 9, 2024	BenchmarkingClustering	—Unverified
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration	Nov 5, 2024	Benchmarkingregression	—Unverified

Show:10 25 50

← PrevPage 160 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified