Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5226–5250 of 5548 papers

Title	Date	Tasks	Status
Out of Distribution Detection on ImageNet-O	Jan 23, 2022	BenchmarkingOut-of-Distribution Detection	CodeCode Available
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping	Jun 23, 2025	BenchmarkingDiversity	CodeCode Available
Deep Affinity Network for Multiple Object Tracking	Oct 28, 2018	BenchmarkingMultiple Object Tracking	CodeCode Available
Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal Optimization	Jul 25, 2019	Benchmarking	CodeCode Available
Benchmarking Hierarchical Script Knowledge	Jun 1, 2019	Benchmarking	CodeCode Available
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark	Feb 14, 2022	BenchmarkingContrastive Learning	CodeCode Available
Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts	Dec 20, 2024	BenchmarkingOptical Character Recognition	CodeCode Available
Towards IID representation learning and its application on biomedical data	Mar 1, 2022	BenchmarkingRepresentation Learning	CodeCode Available
A projected nonlinear state-space model for forecasting time series signals	Nov 22, 2023	BenchmarkingComputational Efficiency	CodeCode Available
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation	Jun 5, 2025	Benchmarking	CodeCode Available
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Mar 6, 2024	BenchmarkingHallucination	CodeCode Available
Dealing with missing data using attention and latent space regularization	Nov 14, 2022	BenchmarkingImputation	CodeCode Available
DCR: Quantifying Data Contamination in LLMs Evaluation	Jul 15, 2025	Arithmetic ReasoningBenchmarking	CodeCode Available
DateLogicQA: Benchmarking Temporal Biases in Large Language Models	Dec 17, 2024	Benchmarking	CodeCode Available
Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation	May 10, 2022	AttributeBenchmarking	CodeCode Available
A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro Data	Nov 11, 2019	BenchmarkingDecision Making	CodeCode Available
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective	Mar 3, 2023	BenchmarkingImage Classification	CodeCode Available
Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models	May 2, 2025	Benchmarking	CodeCode Available
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models	Sep 18, 2024	BenchmarkingModel Selection	CodeCode Available
CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions	Sep 14, 2020	BenchmarkingContinual Learning	CodeCode Available
CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization	Jun 1, 2018	Benchmarkinggeo-localization	CodeCode Available
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages	Mar 14, 2024	BenchmarkingDimensionality Reduction	CodeCode Available
Partial Rankings of Optimizers	Feb 26, 2024	Benchmarking	CodeCode Available
A predictive analytics approach for stroke prediction using machine learning and neural networks	Mar 1, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large p	Oct 17, 2024	Benchmarkingregression	CodeCode Available

Show:10 25 50

← PrevPage 210 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified