Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5326–5350 of 5548 papers

Title	Date	Tasks	Status
PixelHop: A Successive Subspace Learning (SSL) Method for Object Classification	Sep 17, 2019	BenchmarkingDecision Making	CodeCode Available
pke: an open source python-based keyphrase extraction toolkit	Dec 1, 2016	BenchmarkingKeyphrase Extraction	CodeCode Available
Benchmarking Educational Program Repair	May 8, 2024	BenchmarkingProgram Repair	CodeCode Available
A Benchmarking Study of Vision-based Robotic Grasping Algorithms	Mar 14, 2025	BenchmarkingRobotic Grasping	CodeCode Available
CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization	Oct 25, 2022	Abstractive Text SummarizationBenchmarking	CodeCode Available
CREPO: An Open Repository to Benchmark Credal Network Algorithms	May 10, 2021	Benchmarking	CodeCode Available
A Framework for Evaluating PM2.5 Forecasts from the Perspective of Individual Decision Making	Sep 9, 2024	BenchmarkingDecision Making	CodeCode Available
Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI	Nov 23, 2023	BenchmarkingCloud Detection	CodeCode Available
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models	Mar 18, 2025	BenchmarkingSpatial Reasoning	CodeCode Available
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets	Jun 20, 2022	BenchmarkingFraud Detection	CodeCode Available
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison	Mar 1, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework	Sep 24, 2024	Benchmarkingcounterfactual	CodeCode Available
pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events -- Part I: Overview and Results	Apr 3, 2022	Benchmarking	CodeCode Available
pmuBAGE: The Benchmarking Assortment of Generated PMU Data for Power System Events	Oct 25, 2022	Benchmarking	CodeCode Available
Continuous Optimization Benchmarks by Simulation	Aug 14, 2020	BenchmarkingGaussian Processes	CodeCode Available
Continual Learning Strategies for 3D Engineering Regression Problems: A Benchmarking Study	Apr 16, 2025	BenchmarkingContinual Learning	CodeCode Available
Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems	Mar 5, 2025	BenchmarkingCPU	CodeCode Available
Structured Prediction Problem Archive	Feb 4, 2022	BenchmarkingPrediction	CodeCode Available
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking	Sep 23, 2024	BenchmarkingDiversity	CodeCode Available
Benchmarking down-scaled (not so large) pre-trained language models	Sep 1, 2021	Benchmarking	CodeCode Available
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics	Apr 6, 2024	BenchmarkingHallucination	CodeCode Available
ContextGNN goes to Elliot: Towards Benchmarking Relational Deep Learning for Static Link Prediction (aka Personalized Item Recommendation)	Mar 20, 2025	BenchmarkingLink Prediction	CodeCode Available
Selected Languages are All You Need for Cross-lingual Truthfulness Transfer	Jun 20, 2024	AllBenchmarking	CodeCode Available
Content-Aware Differential Privacy with Conditional Invertible Neural Networks	Jul 29, 2022	Benchmarking	CodeCode Available
Population-wise Labeling of Sulcal Graphs using Multi-graph Matching	Jan 31, 2023	BenchmarkingGraph Matching	CodeCode Available

Show:10 25 50

← PrevPage 214 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified