Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1476–1500 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Fast hyperboloid decision tree algorithms	Oct 20, 2023	BenchmarkingRiemannian optimization	CodeCode Available	1	5
BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation	May 7, 2022	6D Pose EstimationBenchmarking	CodeCode Available	1	5
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
BiBench: Benchmarking and Analyzing Network Binarization	Jan 26, 2023	BenchmarkingBinarization	CodeCode Available	1	5
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models	Jan 1, 2024	Benchmarking	CodeCode Available	1	5
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots	Sep 16, 2022	BenchmarkingQuestion Answering	CodeCode Available	1	5
ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory	Aug 24, 2020	Benchmarking	CodeCode Available	1	5
Benchmarking Graph Neural Networks on Dynamic Link Prediction	Sep 29, 2021	BenchmarkingDynamic Link Prediction	CodeCode Available	1	5
Benchmarking Graph Neural Networks for FMRI analysis	Nov 16, 2022	Benchmarking	CodeCode Available	1	5
Beyond neural scaling laws: beating power law scaling via data pruning	Jun 29, 2022	Benchmarking	CodeCode Available	1	5
Beyond Normal: On the Evaluation of Mutual Information Estimators	Jun 19, 2023	BenchmarkingDomain Generalization	CodeCode Available	1	5
Formalizing Multimedia Recommendation through Multimodal Deep Learning	Sep 11, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite	Sep 28, 2023	Benchmarking	CodeCode Available	1	5
Large Language Models for Multi-Robot Systems: A Survey	Feb 6, 2025	Action GenerationBenchmarking	CodeCode Available	1	5
LEAF: A Benchmark for Federated Settings	Dec 3, 2018	Autonomous VehiclesBenchmarking	CodeCode Available	1	5
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nov 1, 2024	BenchmarkingMixture-of-Experts	CodeCode Available	1	5
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks	Nov 25, 2024	Benchmarkingobject-detection	CodeCode Available	1	5
MIRFLEX: Music Information Retrieval Feature Library for Extraction	Nov 1, 2024	BenchmarkingInformation Retrieval	CodeCode Available	1	5
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1	5
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking	Jun 9, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging	Jun 6, 2025	Benchmarking	CodeCode Available	1	5
FiFAR: A Fraud Detection Dataset for Learning to Defer	Dec 20, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness	Jun 1, 2025	BenchmarkingManagement	CodeCode Available	0	5
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs	Sep 26, 2024	BenchmarkingConformal Prediction	CodeCode Available	0	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	0	5

Show:10 25 50

← PrevPage 60 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified