Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 5548 papers

Title	Date	Tasks	Status	Hype
Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions	Aug 28, 2023	BenchmarkingFormation Energy	CodeCode Available	3
LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning	Jun 5, 2023	Benchmarking	CodeCode Available	3
TorchBench: Benchmarking PyTorch with High API Surface Coverage	Apr 27, 2023	BenchmarkingGPU	CodeCode Available	3
Highly Accurate Quantum Chemical Property Prediction with Uni-Mol+	Mar 16, 2023	BenchmarkingGraph Regression	CodeCode Available	3
Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning	Jan 26, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	3
AER: Auto-Encoder with Regression for Time Series Anomaly Detection	Dec 27, 2022	Anomaly DetectionBenchmarking	CodeCode Available	3
CORL: Research-oriented Deep Offline Reinforcement Learning Library	Oct 13, 2022	BenchmarkingD4RL	CodeCode Available	3
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks	Apr 16, 2022	BenchmarkingInstruction Following	CodeCode Available	3
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs	Mar 14, 2022	BenchmarkingGraph Embedding	CodeCode Available	3
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms	Nov 16, 2021	BenchmarkingDeep Reinforcement Learning	CodeCode Available	3
Personalized Benchmarking with the Ludwig Benchmarking Toolkit	Nov 8, 2021	BenchmarkingHyperparameter Optimization	CodeCode Available	3
Benchmarking Multimodal AutoML for Tabular Data with Text Fields	Nov 4, 2021	AutoMLBenchmarking	CodeCode Available	3
A Survey on Performance Metrics for Object-Detection Algorithms	Jul 21, 2020	BenchmarkingObject	CodeCode Available	3
Benchmarking Automatic Machine Learning Frameworks	Aug 17, 2018	Automated Feature EngineeringAutoML	CodeCode Available	3
mlpack 3: a fast, flexible machine learning library	Jun 18, 2018	BenchmarkingBIG-bench Machine Learning	CodeCode Available	3
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering	Jul 15, 2025	BenchmarkingInstruction Following	CodeCode Available	2
GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning	Jul 4, 2025	BenchmarkingGraph Generation	CodeCode Available	2
PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning	Jun 24, 2025	BenchmarkingDrug Discovery	CodeCode Available	2
TAB: Unified Benchmarking of Time Series Anomaly Detection Methods	Jun 22, 2025	Anomaly DetectionBenchmarking	CodeCode Available	2
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models	Jun 17, 2025	BenchmarkingLanguage Modeling	CodeCode Available	2
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks	Jun 13, 2025	BenchmarkingLarge Language Model	CodeCode Available	2
SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis	Jun 12, 2025	BenchmarkingDialogue Generation	CodeCode Available	2
IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments	Jun 11, 2025	Benchmarking	CodeCode Available	2
MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories	Jun 5, 2025	BenchmarkingOptical Character Recognition	CodeCode Available	2
GSCodec Studio: A Modular Framework for Gaussian Splat Compression	Jun 2, 2025	Benchmarking	CodeCode Available	2

Show:10 25 50

← PrevPage 6 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified