Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 4826–4850 of 5548 papers

Title	Date	Tasks	Status
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models	Apr 7, 2025	Benchmarking	CodeCode Available
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models	Oct 20, 2023	Activity PredictionBenchmarking	CodeCode Available
FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational Learning	Oct 19, 2024	BenchmarkingDrug Discovery	CodeCode Available
ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines	Oct 22, 2015	BenchmarkingCPU	CodeCode Available
Wildfire spread forecasting with Deep Learning	May 23, 2025	BenchmarkingDeep Learning	CodeCode Available
Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs	Dec 2, 2015	BenchmarkingSentiment Analysis	CodeCode Available
FIVR: Fine-grained Incident Video Retrieval	Sep 11, 2018	BenchmarkingRetrieval	CodeCode Available
SCEHR: Supervised Contrastive Learning for Clinical Risk Prediction using Electronic Health Records	Oct 11, 2021	BenchmarkingBinary Classification	CodeCode Available
Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification	Jul 13, 2022	BenchmarkingLabel Error Detection	CodeCode Available
Benchmarking Self-Supervised Learning Methods for Accelerated MRI Reconstruction	Feb 19, 2025	BenchmarkingMRI Reconstruction	CodeCode Available
Benchmarking Self-Supervised Contrastive Learning Methods for Image-Based Plant Phenotyping	Mar 1, 2023	BenchmarkingContrastive Learning	CodeCode Available
A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild	Jun 11, 2025	Age EstimationBenchmarking	CodeCode Available
Schroedinger's Threshold: When the AUC doesn't predict Accuracy	Apr 4, 2024	Benchmarking	CodeCode Available
Benchmarking Scalable Methods for Streaming Cross Document Entity Coreference	Aug 1, 2021	BenchmarkingClustering	CodeCode Available
Benchmarking Scalable Epistemic Uncertainty Quantification in Organ Segmentation	Aug 15, 2023	BenchmarkingMedical Image Analysis	CodeCode Available
Automated deep learning segmentation of high-resolution 7 T postmortem MRI for quantitative analysis of structure-pathology correlations in neurodegenerative diseases	Mar 21, 2023	AnatomyBenchmarking	CodeCode Available
Unmasking Societal Biases in Respiratory Support for ICU Patients through Social Determinants of Health	Feb 23, 2025	BenchmarkingFairness	CodeCode Available
There's No Comparison: Reference-less Evaluation Metrics in Grammatical Error Correction	Oct 7, 2016	BenchmarkingGrammatical Error Correction	CodeCode Available
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading	Jun 14, 2024	BenchmarkingMathematical Proofs	CodeCode Available
SciFaultyQA: Benchmarking LLMs on Faulty Science Question Detection with a GAN-Inspired Approach to Synthetic Dataset Generation	Dec 16, 2024	BenchmarkingDataset Generation	CodeCode Available
Benchmarking Safety Monitors for Image Classifiers with Machine Learning	Oct 4, 2021	Autonomous VehiclesBenchmarking	CodeCode Available
First-frame Supervised Video Polyp Segmentation via Propagative and Semantic Dual-teacher Network	Dec 21, 2024	BenchmarkingTransfer Learning	CodeCode Available
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models	Nov 29, 2018	BenchmarkingDiversity	CodeCode Available
MOLE: Digging Tunnels Through Multimodal Multi-Objective Landscapes	Apr 22, 2022	Benchmarking	CodeCode Available
A Linear Constrained Optimization Benchmark For Probabilistic Search Algorithms: The Rotated Klee-Minty Problem	Jul 26, 2018	BenchmarkingEvolutionary Algorithms	CodeCode Available

Show:10 25 50

← PrevPage 194 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified