Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1451–1500 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Explainable Benchmarking for Iterative Optimization Heuristics	Jan 31, 2024	BenchmarkingEvolutionary Algorithms	CodeCode Available	1	5
Explainable Global Wildfire Prediction Models using Graph Neural Networks	Feb 11, 2024	BenchmarkingCommunity Detection	CodeCode Available	1	5
Learning Representations with Contrastive Self-Supervised Learning for Histopathology Applications	Dec 10, 2021	BenchmarkingContrastive Learning	CodeCode Available	1	5
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing	Apr 2, 2025	3D ReconstructionBenchmarking	CodeCode Available	1	5
scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data	Jun 10, 2025	BenchmarkingData Augmentation	CodeCode Available	1	5
Bag of Tricks for Adversarial Training	Oct 1, 2020	Adversarial RobustnessBenchmarking	CodeCode Available	1	5
Biomedical Data-to-Text Generation via Fine-Tuning Transformers	Sep 3, 2021	BenchmarkingData-to-Text Generation	CodeCode Available	1	5
Exploring Large Language Models for Classical Philology	May 23, 2023	BenchmarkingDecoder	CodeCode Available	1	5
BioRED: A Rich Biomedical Relation Extraction Dataset	Apr 8, 2022	BenchmarkingBinary Relation Extraction	CodeCode Available	1	5
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning	Feb 23, 2025	Benchmarking	CodeCode Available	1	5
Latent Thermodynamic Flows: Unified Representation Learning and Generative Modeling of Temperature-Dependent Behaviors from Limited Data	Jul 3, 2025	BenchmarkingRepresentation Learning	CodeCode Available	1	5
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations	Oct 12, 2021	BenchmarkingVoice Conversion	CodeCode Available	1	5
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation	Nov 4, 2024	BenchmarkingGraph Generation	CodeCode Available	1	5
Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts	Nov 7, 2023	BenchmarkingMachine Translation	CodeCode Available	1	5
AQuA: A Benchmarking Tool for Label Quality Assessment	Jun 15, 2023	BenchmarkingLabel Error Detection	CodeCode Available	1	5
Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed	May 27, 2022	BenchmarkingBinary Classification	CodeCode Available	1	5
Binary Code Summarization: Benchmarking ChatGPT/GPT-4 and Other Large Language Models	Dec 15, 2023	BenchmarkingCode Summarization	CodeCode Available	1	5
Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking	May 28, 2025	Benchmarking	CodeCode Available	1	5
ScandEval: A Benchmark for Scandinavian Natural Language Processing	Apr 3, 2023	BenchmarkingCross-Lingual Transfer	CodeCode Available	1	5
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond	Dec 25, 2023	Animal Pose EstimationBenchmarking	CodeCode Available	1	5
Benchmarking large language models for biomedical natural language processing applications and recommendations	May 10, 2023	BenchmarkingDocument Classification	CodeCode Available	1	5
Quantum machine learning of large datasets using randomized measurements	Aug 2, 2021	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
MatTools: Benchmarking Large Language Models for Materials Science Tools	May 16, 2025	BenchmarkingQuestion Answering	CodeCode Available	1	5
FineSurE: Fine-grained Summarization Evaluation using LLMs	Jul 1, 2024	BenchmarkingHallucination	CodeCode Available	1	5
LCA-on-the-Line: Benchmarking Out-of-Distribution Generalization with Class Taxonomies	Jul 22, 2024	BenchmarkingOut-of-Distribution Generalization	CodeCode Available	1	5
Fast hyperboloid decision tree algorithms	Oct 20, 2023	BenchmarkingRiemannian optimization	CodeCode Available	1	5
BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation	May 7, 2022	6D Pose EstimationBenchmarking	CodeCode Available	1	5
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
BiBench: Benchmarking and Analyzing Network Binarization	Jan 26, 2023	BenchmarkingBinarization	CodeCode Available	1	5
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models	Jan 1, 2024	Benchmarking	CodeCode Available	1	5
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots	Sep 16, 2022	BenchmarkingQuestion Answering	CodeCode Available	1	5
ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory	Aug 24, 2020	Benchmarking	CodeCode Available	1	5
Benchmarking Graph Neural Networks on Dynamic Link Prediction	Sep 29, 2021	BenchmarkingDynamic Link Prediction	CodeCode Available	1	5
Benchmarking Graph Neural Networks for FMRI analysis	Nov 16, 2022	Benchmarking	CodeCode Available	1	5
Beyond neural scaling laws: beating power law scaling via data pruning	Jun 29, 2022	Benchmarking	CodeCode Available	1	5
Beyond Normal: On the Evaluation of Mutual Information Estimators	Jun 19, 2023	BenchmarkingDomain Generalization	CodeCode Available	1	5
Formalizing Multimedia Recommendation through Multimodal Deep Learning	Sep 11, 2023	BenchmarkingDeep Learning	CodeCode Available	1	5
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite	Sep 28, 2023	Benchmarking	CodeCode Available	1	5
Large Language Models for Multi-Robot Systems: A Survey	Feb 6, 2025	Action GenerationBenchmarking	CodeCode Available	1	5
LEAF: A Benchmark for Federated Settings	Dec 3, 2018	Autonomous VehiclesBenchmarking	CodeCode Available	1	5
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nov 1, 2024	BenchmarkingMixture-of-Experts	CodeCode Available	1	5
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks	Nov 25, 2024	Benchmarkingobject-detection	CodeCode Available	1	5
MIRFLEX: Music Information Retrieval Feature Library for Extraction	Nov 1, 2024	BenchmarkingInformation Retrieval	CodeCode Available	1	5
FELM: Benchmarking Factuality Evaluation of Large Language Models	Oct 1, 2023	BenchmarkingMath	CodeCode Available	1	5
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular docking	Jun 9, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and Challenging	Jun 6, 2025	Benchmarking	CodeCode Available	1	5
FiFAR: A Fraud Detection Dataset for Learning to Defer	Dec 20, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness	Jun 1, 2025	BenchmarkingManagement	CodeCode Available	0	5
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in Graphs	Sep 26, 2024	BenchmarkingConformal Prediction	CodeCode Available	0	5
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue Systems	Jun 1, 2021	BenchmarkingGoal-Oriented Dialogue Systems	CodeCode Available	0	5

Show:10 25 50

← PrevPage 30 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified