Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4751–4800 of 5548 papers

Title	Date	Tasks	Status
So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection	May 24, 2025	BenchmarkingImage Forgery Detection	—Unverified
An approach for benchmarking the numerical solutions of stochastic compartmental models	Nov 4, 2022	Benchmarking	—Unverified
Soft-Hard Attention U-Net Model and Benchmark Dataset for Multiscale Image Shadow Removal	Aug 7, 2024	BenchmarkingHard Attention	—Unverified
ViC-Bench: Benchmarking Visual-Interleaved Chain-of-Thought Capability in MLLMs with Free-Style Intermediate State Representations	May 20, 2025	Benchmarking	—Unverified
Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents	May 8, 2025	Benchmarking	—Unverified
An Analysis of Quality Indicators Using Approximated Optimal Distributions in a Three-dimensional Objective Space	Sep 27, 2020	Benchmarking	—Unverified
SoK: Systematization and Benchmarking of Deepfake Detectors in a Unified Framework	Jan 9, 2024	BenchmarkingDeepFake Detection	—Unverified
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates	Nov 1, 2022	Benchmarking	—Unverified
Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series	Feb 28, 2025	BenchmarkingSolar Irradiance Forecasting	—Unverified
Solver Scheduling via Answer Set Programming	Jan 6, 2014	BenchmarkingScheduling	—Unverified
Solving the chemical master equation for monomolecular reaction systems analytically: a Doi-Peliti path integral view	Nov 3, 2019	Benchmarking	—Unverified
Solving Urban Network Security Games: Learning Platform, Benchmark, and Challenge for AI Research	Jan 29, 2025	Benchmarking	—Unverified
SOMPT22: A Surveillance Oriented Multi-Pedestrian Tracking Dataset	Aug 4, 2022	BenchmarkingMulti-Object Tracking	—Unverified
SOP-Bench: Complex Industrial SOPs for Evaluating LLM Agents	Jun 9, 2025	BenchmarkingSynthetic Data Generation	—Unverified
Benchmarking Badminton Action Recognition with a New Fine-Grained Dataset	Mar 19, 2024	Action RecognitionBenchmarking	—Unverified
SortBench: Benchmarking LLMs based on their ability to sort lists	Apr 11, 2025	Benchmarking	—Unverified
SOSBENCH: Benchmarking Safety Alignment on Scientific Knowledge	May 27, 2025	BenchmarkingMultiple-choice	—Unverified
WiSoSuper: Benchmarking Super-Resolution Methods on Wind and Solar Data	Sep 17, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
So you think you can track?	Sep 13, 2023	BenchmarkingObject	—Unverified
An Analysis of Model Robustness across Concurrent Distribution Shifts	Jan 8, 2025	Benchmarking	—Unverified
SpaceTx: A Roadmap for Benchmarking Spatial Transcriptomics Exploration of the Brain	Jan 20, 2023	BenchmarkingCell Segmentation	—Unverified
An Analysis of Control Parameters of MOEA/D Under Two Different Optimization Scenarios	Oct 2, 2020	BenchmarkingEvolutionary Algorithms	—Unverified
Sparse Deep Nonnegative Matrix Factorization	Jul 28, 2017	BenchmarkingDimensionality Reduction	—Unverified
Sparse Representation-Based Classification: Orthogonal Least Squares or Orthogonal Matching Pursuit?	Jul 18, 2016	BenchmarkingClassification	—Unverified
Spatially Binned ROC: A Comprehensive Saliency Metric	Jun 1, 2016	Benchmarking	—Unverified
Spatially Correlated Patterns in Adversarial Images	Nov 21, 2020	BenchmarkingBlocking	—Unverified
Analyzing the Impact of Fake News on the Anticipated Outcome of the 2024 Election Ahead of Time	Dec 1, 2023	ArticlesBenchmarking	—Unverified
Analyzing the Impact of Undersampling on the Benchmarking and Configuration of Evolutionary Algorithms	Apr 20, 2022	BenchmarkingEvolutionary Algorithms	—Unverified
Spatio-Temporal Latent Graph Structure Learning for Traffic Forecasting	Feb 25, 2022	BenchmarkingGraph Neural Network	—Unverified
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos	Jun 5, 2025	BenchmarkingMathematical Reasoning	—Unverified
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues	Apr 21, 2025	BenchmarkingSpeaker Identification	—Unverified
SPEAL: Skeletal Prior Embedded Attention Learning for Cross-Source Point Cloud Registration	Dec 14, 2023	BenchmarkingPoint Cloud Registration	—Unverified
Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability	Jul 9, 2024	BenchmarkingDecoder	—Unverified
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations	Jul 1, 2022	BenchmarkingCombinatorial Optimization	—Unverified
ABSA-Bench: Towards the Unified Evaluation of Aspect-based Sentiment Analysis Research	Dec 1, 2020	Aspect-Based Sentiment AnalysisAspect-Based Sentiment Analysis (ABSA)	—Unverified
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads	Aug 28, 2023	BenchmarkingSelf-Supervised Learning	—Unverified
SpeechVerse: A Large-scale Generalizable Audio Language Model	May 14, 2024	Automatic Speech RecognitionBenchmarking	—Unverified
Speed Benchmarking of Genetic Programming Frameworks	May 25, 2021	Benchmarking	—Unverified
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic	Oct 23, 2023	BenchmarkingInstruction Following	—Unverified
Analyzing Hong Kong's Legal Judgments from a Computational Linguistics point-of-view	May 4, 2023	BenchmarkingGraph Generation	—Unverified
Analysis of modular CMA-ES on strict box-constrained problems in the SBOX-COST benchmarking suite	May 24, 2023	Benchmarking	—Unverified
SPINEX-Clustering: Similarity-based Predictions with Explainable Neighbors Exploration for Clustering Problems	Jul 9, 2024	BenchmarkingClustering	—Unverified
SPINEX_ Symbolic Regression: Similarity-based Symbolic Regression with Explainable Neighbors Exploration	Nov 5, 2024	Benchmarkingregression	—Unverified
SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields	Nov 22, 2022	3D Inpainting3D Instance Segmentation	—Unverified
Analysis of different disparity estimation techniques on aerial stereo image datasets	Oct 9, 2024	BenchmarkingDepth Estimation	—Unverified
Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations	Aug 10, 2023	BenchmarkingClassification	—Unverified
SpiralMLP: A Lightweight Vision MLP Architecture	Mar 31, 2024	Benchmarking	—Unverified
ABOUT ML: Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles	Dec 12, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
SpokenNativQA: Multilingual Everyday Spoken Queries for LLMs	May 25, 2025	BenchmarkingDiversity	—Unverified
Sports Intelligence: Assessing the Sports Understanding Capabilities of Language Models through Question Answering from Text to Video	Jun 21, 2024	BenchmarkingFew-Shot Learning	—Unverified

Show:10 25 50

← PrevPage 96 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified