Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5451–5475 of 5548 papers

Title	Date	Tasks	Status
A Baseline Statistical Method For Robust User-Assisted Multiple Segmentation	Jan 8, 2022	BenchmarkingImage Segmentation	CodeCode Available
COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting	Mar 29, 2016	BenchmarkingMultiobjective Optimization	CodeCode Available
VisionAD, a software package of performant anomaly detection algorithms, and Proportion Localised, an interpretable metric	Jun 7, 2024	Anomaly DetectionBenchmarking	CodeCode Available
CNM: An Interpretable Complex-valued Network for Matching	Apr 10, 2019	BenchmarkingQuestion Answering	CodeCode Available
Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures	Nov 17, 2018	BenchmarkingClustering	CodeCode Available
QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers	Oct 8, 2024	Benchmarking	CodeCode Available
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations	Oct 10, 2024	BenchmarkingDecision Making	CodeCode Available
QLBS: Q-Learner in the Black-Scholes(-Merton) Worlds	Dec 13, 2017	BenchmarkingModel-based Reinforcement Learning	CodeCode Available
Benchmarking AutoML algorithms on a collection of synthetic classification problems	Dec 6, 2022	AutoMLBenchmarking	CodeCode Available
Benchmarking a transformer-FREE model for ad-hoc retrieval	Apr 1, 2021	BenchmarkingCPU	CodeCode Available
Benchmarking Approximate Inference Methods for Neural Structured Prediction	Apr 1, 2019	BenchmarkingPrediction	CodeCode Available
LMEMs for post-hoc analysis of HPO Benchmarking	Aug 5, 2024	BenchmarkingHyperparameter Optimization	CodeCode Available
Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative Metrics	Jul 5, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection	Feb 20, 2018	ArticlesBenchmarking	CodeCode Available
Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification	Sep 21, 2022	BenchmarkingManagement	CodeCode Available
Who’s on First?: Probing the Learning and Representation Capabilities of Language Models on Deterministic Closed Domains	Nov 1, 2021	BenchmarkingLanguage Modeling	CodeCode Available
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available
Quality Indicators for Preference-based Evolutionary Multi-objective Optimization Using a Reference Point: A Review and Analysis	Jan 28, 2023	BenchmarkingDecision Making	CodeCode Available
CLMB: deep contrastive learning for robust metagenomic binning	Nov 18, 2021	BenchmarkingContrastive Learning	CodeCode Available
Investigation of UAV Detection in Images with Complex Backgrounds and Rainy Artifacts	May 25, 2023	Benchmarkingobject-detection	CodeCode Available
Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts	Nov 18, 2024	BenchmarkingMultimodal Large Language Model	CodeCode Available
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems	Apr 4, 2025	BenchmarkingModel Selection	CodeCode Available
Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration	Jan 27, 2023	BenchmarkingGraph Classification	CodeCode Available
Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System	Jul 28, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available
Benchmarking and Understanding Compositional Relational Reasoning of LLMs	Dec 17, 2024	BenchmarkingRelational Reasoning	CodeCode Available

Show:10 25 50

← PrevPage 219 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified