Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2451–2500 of 5548 papers

Title	Date	Tasks	Status
Benchmarking Systematic Relational Reasoning with Large Language and Reasoning Models	Mar 30, 2025	BenchmarkingRelational Reasoning	—Unverified
Benchmarking symbolic regression constant optimization schemes	Dec 3, 2024	Benchmarkingregression	—Unverified
Benchmarking Surrogate-Assisted Genetic Recommender Systems	Aug 8, 2019	BenchmarkingEvolutionary Algorithms	—Unverified
A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values	Jun 5, 2025	Benchmarking	—Unverified
A large-scale, physically-based synthetic dataset for satellite pose estimation	Jun 15, 2025	BenchmarkingDataset Generation	—Unverified
Benchmarking Super-Resolution Algorithms on Real Data	Sep 8, 2017	BenchmarkingSuper-Resolution	—Unverified
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models	Feb 21, 2024	BenchmarkingImage to text	—Unverified
A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects	Jun 16, 2025	BenchmarkingInstance Segmentation	—Unverified
Stereotype Detection in LLMs: A Multiclass, Explainable, and Benchmark-Driven Approach	Apr 2, 2024	BenchmarkingCommon Sense Reasoning	—Unverified
Benchmarking Sub-Genre Classification For Mainstage Dance Music	Sep 10, 2024	BenchmarkingClassification	—Unverified
A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning	Jun 17, 2025	BenchmarkingSelf-Supervised Learning	—Unverified
Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations	Aug 3, 2024	BenchmarkingDeep Reinforcement Learning	—Unverified
Geometry-Based Next Frame Prediction from Monocular Video	Sep 20, 2016	Autonomous DrivingBenchmarking	—Unverified
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals	May 30, 2025	BenchmarkingEarth Observation	—Unverified
Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms	Apr 2, 2025	BenchmarkingSemantic Segmentation	—Unverified
Variational Laplace for Bayesian neural networks	Nov 20, 2020	BenchmarkingVariational Inference	—Unverified
Benchmarking state-of-the-art gradient boosting algorithms for classification	May 26, 2023	Bayesian OptimizationBenchmarking	—Unverified
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture	Apr 21, 2025	Benchmarkingclass-incremental learning	—Unverified
Benchmarking State-of-the-Art Deep Learning Software Tools	Aug 25, 2016	BenchmarkingCPU	—Unverified
A Large-Scale Evaluation of Speech Foundation Models	Apr 15, 2024	Benchmarking	—Unverified
Benchmarking Spiking Neural Network Learning Methods with Varying Locality	Feb 1, 2024	Benchmarking	—Unverified
A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images	Feb 27, 2024	BenchmarkingDefect Detection	—Unverified
A2Perf: Real-World Autonomous Agents Benchmark	Mar 4, 2025	BenchmarkingCombinatorial Optimization	—Unverified
A 28-nm Convolutional Neuromorphic Processor Enabling Online Learning with Spike-Based Retinas	May 13, 2020	BenchmarkingEdge-computing	—Unverified
Benchmarking sparse system identification with low-dimensional chaos	Feb 4, 2023	Benchmarking	—Unverified
Benchmarking SMT Performance for Farsi Using the TEP++ Corpus	May 1, 2015	BenchmarkingMachine Translation	—Unverified
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain	Oct 31, 2023	BenchmarkingDiagnostic	—Unverified
Benchmarking Smoothness and Reducing High-Frequency Oscillations in Continuous Control Policies	Oct 22, 2024	Benchmarkingcontinuous-control	—Unverified
A Two-Stage Neural-Filter Pareto Front Extractor and the need for Benchmarking	Sep 29, 2021	BenchmarkingMulti-Task Learning	—Unverified
Benchmarking Single-Image Reflection Removal Algorithms	Oct 1, 2017	BenchmarkingReflection Removal	—Unverified
A tutorial on multi-view autoencoders using the multi-view-AE library	Mar 12, 2024	Benchmarking	—Unverified
Attention versus Contrastive Learning of Tabular Data -- A Data-centric Benchmarking	Jan 8, 2024	BenchmarkingContrastive Learning	—Unverified
Benchmarking simulated and physical quantum processing units using quantum and hybrid algorithms	Nov 28, 2022	Benchmarking	—Unverified
A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking	Jun 21, 2023	Adversarial RobustnessBenchmarking	—Unverified
Benchmarking Shadow Removal for Facial Landmark Detection and Beyond	Nov 27, 2021	BenchmarkingBlocking	—Unverified
A Large-scale Class-level Benchmark Dataset for Code Generation with LLMs	Apr 22, 2025	BenchmarkingClass-level Code Generation	—Unverified
Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition	Jan 31, 2024	Action RecognitionBenchmarking	—Unverified
GenSpace: Benchmarking Spatially-Aware Image Generation	May 30, 2025	BenchmarkingImage Generation	—Unverified
A Large-Scale Analysis on Self-Supervised Video Representation Learning	Jun 9, 2023	BenchmarkingRepresentation Learning	—Unverified
A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior	May 13, 2025	BenchmarkingSeismic Interpretation	—Unverified
On the Evaluation of Engineering Artificial General Intelligence	May 15, 2025	Benchmarking	—Unverified
Genicious: Contextual Few-shot Prompting for Insights Discovery	Mar 15, 2025	BenchmarkingDecision Making	—Unverified
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks	Sep 29, 2024	Benchmarking	—Unverified
Benchmarking Scientific Image Forgery Detectors	May 26, 2021	Benchmarking	—Unverified
Benchmarking Scene Text Recognition in Devanagari, Telugu and Malayalam	Apr 9, 2021	BenchmarkingScene Text Recognition	—Unverified
Benchmarking Sample Selection Strategies for Batch Reinforcement Learning	Sep 29, 2021	BenchmarkingImitation Learning	—Unverified
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking	Feb 28, 2023	Adversarial RobustnessBenchmarking	—Unverified
Generative Psycho-Lexical Approach for Constructing Value Systems in Large Language Models	Feb 4, 2025	BenchmarkingDecision Making	—Unverified
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models	Jun 7, 2024	BenchmarkingDenoising	—Unverified
GeoGebra Tools with Proof Capabilities	Mar 3, 2016	Automated Theorem ProvingBenchmarking	—Unverified

Show:10 25 50

← PrevPage 50 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified