Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5201–5250 of 5548 papers

Title	Date	Tasks	Status
2017 Robotic Instrument Segmentation Challenge	Feb 18, 2019	BenchmarkingPerson Re-Identification	CodeCode Available
AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias	Oct 3, 2018	BenchmarkingDecision Making	CodeCode Available
Benchmarking Intersectional Biases in NLP	Jul 1, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Benchmarking Commercial Intent Detection Services with Practice-Driven Evaluations	Dec 7, 2020	BenchmarkingGoal-Oriented Dialog	CodeCode Available
Towards Fair and Privacy-Preserving Federated Deep Models	Jun 4, 2019	BenchmarkingDeep Learning	CodeCode Available
SPDEBench: An Extensive Benchmark for Learning Regular and Singular Stochastic PDEs	May 24, 2025	Benchmarking	CodeCode Available
Deep Neural Network Benchmarks for Selective Classification	Jan 23, 2024	BenchmarkingClassification	CodeCode Available
Abstraction Alignment: Comparing Model-Learned and Human-Encoded Conceptual Relationships	Jul 17, 2024	Benchmarking	CodeCode Available
Arabic Speech Recognition by End-to-End, Modular Systems and Human	Jan 21, 2021	Arabic Speech RecognitionAutomatic Speech Recognition	CodeCode Available
Benchmarking Image Perturbations for Testing Automated Driving Assistance Systems	Jan 21, 2025	Autonomous VehiclesBenchmarking	CodeCode Available
Deep Metric Learning Meets Deep Clustering: An Novel Unsupervised Approach for Feature Embedding	Sep 9, 2020	BenchmarkingClustering	CodeCode Available
Deepened Graph Auto-Encoders Help Stabilize and Enhance Link Prediction	Mar 21, 2021	BenchmarkingClustering	CodeCode Available
Oral Imaging for Malocclusion Issues Assessments: OMNI Dataset, Deep Learning Baselines and Benchmarking	May 21, 2025	BenchmarkingDiagnostic	CodeCode Available
Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning	Jul 9, 2025	BenchmarkingImage Retrieval	CodeCode Available
ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization	Oct 17, 2024	BenchmarkingStance Detection	CodeCode Available
Benchmarking Human and Automated Prompting in the Segment Anything Model	Oct 29, 2024	BenchmarkingImage Segmentation	CodeCode Available
Speech Self-Supervised Representation Benchmarking: Are We Doing it Right?	Jun 1, 2023	BenchmarkingDecoder	CodeCode Available
Deep Emotion Recognition in Textual Conversations: A Survey	Nov 16, 2022	BenchmarkingEmotion Recognition	CodeCode Available
Neural Style Transfer Improves 3D Cardiovascular MR Image Segmentation on Inconsistent Data	Sep 20, 2019	BenchmarkingEnsemble Learning	CodeCode Available
OSS-Bench: Benchmark Generator for Coding LLMs	May 18, 2025	Benchmarking	CodeCode Available
DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network	Feb 4, 2019	BenchmarkingSpecificity	CodeCode Available
deepCR: Cosmic Ray Rejection with Deep Learning	Jul 22, 2019	BenchmarkingCPU	CodeCode Available
A quantum-classical reinforcement learning model to play Atari games	Dec 11, 2024	Atari GamesBenchmarking	CodeCode Available
Towards Ground-truth-free Evaluation of Any Segmentation in Medical Images	Sep 23, 2024	BenchmarkingSegmentation	CodeCode Available
Deep Attention Driven Reinforcement Learning (DAD-RL) for Autonomous Decision-Making in Dynamic Environment	Jul 12, 2024	BenchmarkingDecision Making	CodeCode Available
Out of Distribution Detection on ImageNet-O	Jan 23, 2022	BenchmarkingOut-of-Distribution Detection	CodeCode Available
Benchmarking histopathology foundation models in a multi-center dataset for skin cancer subtyping	Jun 23, 2025	BenchmarkingDiversity	CodeCode Available
Deep Affinity Network for Multiple Object Tracking	Oct 28, 2018	BenchmarkingMultiple Object Tracking	CodeCode Available
Benchmarking HillVallEA for the GECCO 2019 Competition on Multimodal Optimization	Jul 25, 2019	Benchmarking	CodeCode Available
Benchmarking Hierarchical Script Knowledge	Jun 1, 2019	Benchmarking	CodeCode Available
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark	Feb 14, 2022	BenchmarkingContrastive Learning	CodeCode Available
Deciphering the Underserved: Benchmarking LLM OCR for Low-Resource Scripts	Dec 20, 2024	BenchmarkingOptical Character Recognition	CodeCode Available
Towards IID representation learning and its application on biomedical data	Mar 1, 2022	BenchmarkingRepresentation Learning	CodeCode Available
A projected nonlinear state-space model for forecasting time series signals	Nov 22, 2023	BenchmarkingComputational Efficiency	CodeCode Available
Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation	Jun 5, 2025	Benchmarking	CodeCode Available
Benchmarking Hallucination in Large Language Models based on Unanswerable Math Word Problem	Mar 6, 2024	BenchmarkingHallucination	CodeCode Available
Dealing with missing data using attention and latent space regularization	Nov 14, 2022	BenchmarkingImputation	CodeCode Available
DCR: Quantifying Data Contamination in LLMs Evaluation	Jul 15, 2025	Arithmetic ReasoningBenchmarking	CodeCode Available
DateLogicQA: Benchmarking Temporal Biases in Large Language Models	Dec 17, 2024	Benchmarking	CodeCode Available
Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation	May 10, 2022	AttributeBenchmarking	CodeCode Available
A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro Data	Nov 11, 2019	BenchmarkingDecision Making	CodeCode Available
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective	Mar 3, 2023	BenchmarkingImage Classification	CodeCode Available
Parameterized Argumentation-based Reasoning Tasks for Benchmarking Generative Language Models	May 2, 2025	Benchmarking	CodeCode Available
PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models	Sep 18, 2024	BenchmarkingModel Selection	CodeCode Available
CVPR 2020 Continual Learning in Computer Vision Competition: Approaches, Results, Current Challenges and Future Directions	Sep 14, 2020	BenchmarkingContinual Learning	CodeCode Available
CVM-Net: Cross-View Matching Network for Image-Based Ground-to-Aerial Geo-Localization	Jun 1, 2018	Benchmarkinggeo-localization	CodeCode Available
SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages	Mar 14, 2024	BenchmarkingDimensionality Reduction	CodeCode Available
Partial Rankings of Optimizers	Feb 26, 2024	Benchmarking	CodeCode Available
A predictive analytics approach for stroke prediction using machine learning and neural networks	Mar 1, 2022	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large p	Oct 17, 2024	Benchmarkingregression	CodeCode Available

Show:10 25 50

← PrevPage 105 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified