Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4801–4850 of 5548 papers

Title	Date	Tasks	Status
FR-MRInet: A Deep Convolutional Encoder-Decoder for Brain Tumor Segmentation with Relu-RGB and Sliding-window	Jul 26, 2018	BenchmarkingBrain Tumor Segmentation	CodeCode Available
AdamZ: An Enhanced Optimisation Method for Neural Network Training	Nov 22, 2024	Benchmarking	CodeCode Available
MLPerf Training Benchmark	Oct 2, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration	Feb 7, 2022	BenchmarkingEvolutionary Algorithms	CodeCode Available
Benchmarking Spurious Bias in Few-Shot Image Classifiers	Sep 4, 2024	AttributeBenchmarking	CodeCode Available
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering	May 27, 2025	BenchmarkingQuestion Answering	CodeCode Available
FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters	Sep 8, 2022	Benchmarkingcontinuous-control	CodeCode Available
MMCoQA: Conversational Question Answering over Text, Tables, and Images	May 1, 2022	BenchmarkingConversational Question Answering	CodeCode Available
Forecasting time series with constraints	Feb 14, 2025	Additive modelsBenchmarking	CodeCode Available
Action-conditioned Benchmarking of Robotic Video Prediction Models: a Comparative Study	Oct 7, 2019	BenchmarkingPrediction	CodeCode Available
Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges	May 16, 2025	BenchmarkingState Estimation	CodeCode Available
Forecasting Future International Events: A Reliable Dataset for Text-Based Event Modeling	Nov 21, 2024	ArticlesBenchmarking	CodeCode Available
Benchmarking Single Image Dehazing and Beyond	Dec 12, 2017	BenchmarkingImage Dehazing	CodeCode Available
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in Omniverse	Jun 23, 2022	BenchmarkingIndoor Scene Synthesis	CodeCode Available
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support	Jun 15, 2023	BenchmarkingInformation Retrieval	CodeCode Available
Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach	Oct 9, 2017	BenchmarkingClustering	CodeCode Available
fMRI-S4: learning short- and long-range dynamic fMRI dependencies using 1D Convolutions and State Space Models	Aug 8, 2022	BenchmarkingState Space Models	CodeCode Available
Scaling and Benchmarking Self-Supervised Visual Representation Learning	May 3, 2019	Benchmarkingobject-detection	CodeCode Available
Scaling Compute Is Not All You Need for Adversarial Robustness	Dec 20, 2023	Adversarial RobustnessAll	CodeCode Available
Scaling Up Resonate-and-Fire Networks for Fast Deep Learning	Apr 1, 2025	BenchmarkingDeep Learning	CodeCode Available
Universal Music Representations? Evaluating Foundation Models on World Music Corpora	Jun 20, 2025	BenchmarkingFew-Shot Learning	CodeCode Available
MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms	Feb 21, 2024	BenchmarkingHate Speech Detection	CodeCode Available
Fluorescence Reference Target Quantitative Analysis Library	Apr 22, 2025	Benchmarking	CodeCode Available
FLsim: A Modular and Library-Agnostic Simulation Framework for Federated Learning	Jul 15, 2025	BenchmarkingFederated Learning	CodeCode Available
FlowCyt: A Comparative Study of Deep Learning Approaches for Multi-Class Classification in Flow Cytometry Benchmarking	Feb 28, 2024	BenchmarkingInductive Learning	CodeCode Available
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models	Apr 7, 2025	Benchmarking	CodeCode Available
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models	Oct 20, 2023	Activity PredictionBenchmarking	CodeCode Available
FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational Learning	Oct 19, 2024	BenchmarkingDrug Discovery	CodeCode Available
ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines	Oct 22, 2015	BenchmarkingCPU	CodeCode Available
Wildfire spread forecasting with Deep Learning	May 23, 2025	BenchmarkingDeep Learning	CodeCode Available
Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs	Dec 2, 2015	BenchmarkingSentiment Analysis	CodeCode Available
FIVR: Fine-grained Incident Video Retrieval	Sep 11, 2018	BenchmarkingRetrieval	CodeCode Available
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records	Oct 11, 2021	BenchmarkingBinary Classification	CodeCode Available
Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification	Jul 13, 2022	BenchmarkingLabel Error Detection	CodeCode Available
Benchmarking Self-Supervised Learning Methods for Accelerated MRI Reconstruction	Feb 19, 2025	BenchmarkingMRI Reconstruction	CodeCode Available
Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant Phenotyping	Mar 1, 2023	BenchmarkingContrastive Learning	CodeCode Available
A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild	Jun 11, 2025	Age EstimationBenchmarking	CodeCode Available
Schroedinger's Threshold: When the AUC doesn't predict Accuracy	Apr 4, 2024	Benchmarking	CodeCode Available
Benchmarking Scalable Methods for Streaming Cross Document Entity Coreference	Aug 1, 2021	BenchmarkingClustering	CodeCode Available
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ Segmentation	Aug 15, 2023	BenchmarkingMedical Image Analysis	CodeCode Available
Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseases	Mar 21, 2023	AnatomyBenchmarking	CodeCode Available
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health	Feb 23, 2025	BenchmarkingFairness	CodeCode Available
There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction	Oct 7, 2016	BenchmarkingGrammatical Error Correction	CodeCode Available
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading	Jun 14, 2024	BenchmarkingMathematical Proofs	CodeCode Available
SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset Generation	Dec 16, 2024	BenchmarkingDataset Generation	CodeCode Available
Benchmarking Safety Monitors for Image Classifiers with Machine Learning	Oct 4, 2021	Autonomous VehiclesBenchmarking	CodeCode Available
First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher Network	Dec 21, 2024	BenchmarkingTransfer Learning	CodeCode Available
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models	Nov 29, 2018	BenchmarkingDiversity	CodeCode Available
MOLE: Digging Tunnels Through Multimodal Multi-Objective Landscapes	Apr 22, 2022	Benchmarking	CodeCode Available
A Linear Constrained Optimization Benchmark For Probabilistic Search Algorithms: The Rotated Klee-Minty Problem	Jul 26, 2018	BenchmarkingEvolutionary Algorithms	CodeCode Available

Show:10 25 50

← PrevPage 97 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified