Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5451–5500 of 5548 papers

Title	Date	Tasks	Status
A Baseline Statistical Method For Robust User-Assisted Multiple Segmentation	Jan 8, 2022	BenchmarkingImage Segmentation	CodeCode Available
COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting	Mar 29, 2016	BenchmarkingMultiobjective Optimization	CodeCode Available
VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metric	Jun 7, 2024	Anomaly DetectionBenchmarking	CodeCode Available
CNM: An Interpretable Complex-valued Network for Matching	Apr 10, 2019	BenchmarkingQuestion Answering	CodeCode Available
Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures	Nov 17, 2018	BenchmarkingClustering	CodeCode Available
QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers	Oct 8, 2024	Benchmarking	CodeCode Available
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations	Oct 10, 2024	BenchmarkingDecision Making	CodeCode Available
QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds	Dec 13, 2017	BenchmarkingModel-based Reinforcement Learning	CodeCode Available
Benchmarking AutoML algorithms on a collection of synthetic classification problems	Dec 6, 2022	AutoMLBenchmarking	CodeCode Available
Benchmarking a transformer-FREE model for ad-hoc retrieval	Apr 1, 2021	BenchmarkingCPU	CodeCode Available
Benchmarking Approximate Inference Methods for Neural Structured Prediction	Apr 1, 2019	BenchmarkingPrediction	CodeCode Available
LMEMs for post-hoc analysis of HPO Benchmarking	Aug 5, 2024	BenchmarkingHyperparameter Optimization	CodeCode Available
Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative Metrics	Jul 5, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection	Feb 20, 2018	ArticlesBenchmarking	CodeCode Available
Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification	Sep 21, 2022	BenchmarkingManagement	CodeCode Available
Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed Domains	Nov 1, 2021	BenchmarkingLanguage Modeling	CodeCode Available
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available
Quality Indicators for Preference-based Evolutionary Multi-objective Optimization Using a Reference Point: A Review and Analysis	Jan 28, 2023	BenchmarkingDecision Making	CodeCode Available
CLMB: deep contrastive learning for robust metagenomic binning	Nov 18, 2021	BenchmarkingContrastive Learning	CodeCode Available
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts	May 25, 2023	Benchmarkingobject-detection	CodeCode Available
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Nov 18, 2024	BenchmarkingMultimodal Large Language Model	CodeCode Available
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems	Apr 4, 2025	BenchmarkingModel Selection	CodeCode Available
Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration	Jan 27, 2023	BenchmarkingGraph Classification	CodeCode Available
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System	Jul 28, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available
Benchmarking and Understanding Compositional Relational Reasoning of LLMs	Dec 17, 2024	BenchmarkingRelational Reasoning	CodeCode Available
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases	Mar 6, 2025	BenchmarkingDiagnostic	CodeCode Available
A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus Detection	Nov 23, 2018	BenchmarkingCervical Nucleus Detection	CodeCode Available
ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures	Jun 14, 2024	Answer GenerationBenchmarking	CodeCode Available
Benchmarking and Rethinking Knowledge Editing for Large Language Models	May 24, 2025	Benchmarkingknowledge editing	CodeCode Available
CLEAVE: Scalable and Edge-native Benchmarking of Networked Control Systems	Apr 5, 2022	BenchmarkingEdge-computing	CodeCode Available
Quantitative Metrics for Benchmarking Human-Aware Robot Navigation	Jul 26, 2023	BenchmarkingRobot Navigation	CodeCode Available
Benchmarking and optimizing organism wide single-cell RNA alignment methods	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available
XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series Classification	Oct 23, 2023	BenchmarkingTime Series	CodeCode Available
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models	Mar 6, 2025	BenchmarkingContinual Learning	CodeCode Available
Benchmarking and Improving Text-to-SQL Generation under Ambiguity	Oct 20, 2023	BenchmarkingDiversity	CodeCode Available
Quantum Boosting using Domain-Partitioning Hypotheses	Oct 25, 2021	BenchmarkingEnsemble Learning	CodeCode Available
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs	May 16, 2025	BenchmarkingQuestion Answering	CodeCode Available
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation	Apr 5, 2024	AttributeBenchmarking	CodeCode Available
Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum Simulations	Mar 9, 2024	BenchmarkingCPU	CodeCode Available
TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images	Apr 1, 2025	Autonomous NavigationBenchmarking	CodeCode Available
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papers	Nov 6, 2021	BenchmarkingRetinal Vessel Segmentation	CodeCode Available
Adversarial Environment Generation for Learning to Navigate the Web	Mar 2, 2021	BenchmarkingDecision Making	CodeCode Available
A*3D Dataset: Towards Autonomous Driving in Challenging Environments	Sep 17, 2019	3D Object DetectionAutonomous Driving	CodeCode Available
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring	Mar 23, 2024	BenchmarkingText to SQL	CodeCode Available
Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies	Mar 11, 2024	BenchmarkingData Augmentation	CodeCode Available
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample	Jan 28, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Quaternion Capsule Networks	Jul 8, 2020	BenchmarkingObject Recognition	CodeCode Available
QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results	Dec 19, 2021	BenchmarkingBrain Tumor Segmentation	CodeCode Available
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs	Dec 16, 2024	BenchmarkingCommon Sense Reasoning	CodeCode Available
Question-Answering Dense Video Events	Sep 6, 2024	BenchmarkingQuestion Answering	CodeCode Available

Show:10 25 50

← PrevPage 110 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified