Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 851–900 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations	Jul 4, 2018	Adversarial DefenseBenchmarking	CodeCode Available	1	5
EntQA: Entity Linking as Question Answering	Oct 5, 2021	BenchmarkingEntity Linking	CodeCode Available	1	5
Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark	Mar 9, 2024	BenchmarkingFairness	CodeCode Available	1	5
ClinicRealm: Re-evaluating Large Language Models with Conventional Machine Learning for Non-Generative Clinical Prediction Tasks	Jul 26, 2024	BenchmarkingModel Selection	CodeCode Available	1	5
Addressing the generalization of 3D registration methods with a featureless baseline and an unbiased benchmark	Mar 23, 2024	BenchmarkingImage to Point Cloud Registration	CodeCode Available	1	5
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1	5
An Empirical Study on Google Research Football Multi-agent Scenarios	May 16, 2023	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1	5
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1	5
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning	Nov 29, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1	5
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1	5
AIPerf: Automated machine learning as an AI-HPC benchmark	Aug 17, 2020	AutoMLBenchmarking	CodeCode Available	1	5
Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models	May 26, 2025	BenchmarkingRAG	CodeCode Available	1	5
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs	Apr 28, 2024	Benchmarking	CodeCode Available	1	5
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1	5
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction	Sep 4, 2019	BenchmarkingGeneral Classification	CodeCode Available	1	5
Benchmarking Batch Deep Reinforcement Learning Algorithms	Oct 3, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1	5
Enhancing Biomedical Relation Extraction with Directionality	Jan 23, 2025	BenchmarkingDocument-level Relation Extraction	CodeCode Available	1	5
Enhancing Ligand Pose Sampling for Molecular Docking	Nov 30, 2023	BenchmarkingMolecular Docking	CodeCode Available	1	5
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets	May 7, 2024	BenchmarkingCancer Classification	CodeCode Available	1	5
A Survey of Pathology Foundation Model: Progress and Future Directions	Apr 5, 2025	BenchmarkingMultiple Instance Learning	CodeCode Available	1	5
ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models	Nov 29, 2021	BenchmarkingPhysical Simulations	CodeCode Available	1	5
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics	Jun 8, 2021	Age And Gender ClassificationBenchmarking	CodeCode Available	1	5
Coarse-to-Fine Q-attention with Learned Path Ranking	Apr 4, 2022	Benchmarking	CodeCode Available	1	5
A Comprehensive Benchmark for RNA 3D Structure-Function Modeling	Mar 27, 2025	BenchmarkingDeep Learning	CodeCode Available	1	5
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation	Apr 30, 2025	3D Molecule GenerationBenchmarking	CodeCode Available	1	5
End-to-end Knowledge Retrieval with Multi-modal Queries	Jun 1, 2023	BenchmarkingCross-Modal Retrieval	CodeCode Available	1	5
Enhancing spatial and textual analysis with EUPEG: an extensible and unified platform for evaluating geoparsers	Jul 9, 2020	Benchmarking	CodeCode Available	1	5
Knodle: Modular Weakly Supervised Learning with PyTorch	Apr 23, 2021	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
SHARP: Environment and Person Independent Activity Recognition with Commodity IEEE 802.11 Access Points	Mar 17, 2021	Activity RecognitionBenchmarking	CodeCode Available	1	5
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1	5
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1	5
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy	Oct 23, 2020	BenchmarkingDiagnostic	CodeCode Available	1	5
CODEBench: A Neural Architecture and Hardware Accelerator Co-Design Framework	Dec 7, 2022	Benchmarking	CodeCode Available	1	5
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1	5
Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations	Mar 21, 2024	BenchmarkingMemorization	CodeCode Available	1	5
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1	5
Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset	Jul 3, 2024	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study	Dec 30, 2021	AttributeBenchmarking	CodeCode Available	1	5
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care	Sep 16, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1	5
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM	Nov 26, 2024	BenchmarkingText-to-Video Generation	CodeCode Available	1	5
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking MRI Reconstruction Neural Networks on Large Public Datasets	Mar 6, 2020	BenchmarkingImage Reconstruction	CodeCode Available	1	5
Benchmarking Cognitive Biases in Large Language Models as Evaluators	Sep 29, 2023	BenchmarkingIn-Context Learning	CodeCode Available	1	5
EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence	Nov 1, 2023	BenchmarkingCryogenic Electron Microscopy (cryo-EM)	CodeCode Available	1	5
Recent Advances on Neural Network Pruning at Initialization	Mar 11, 2021	BenchmarkingNetwork Pruning	CodeCode Available	1	5
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Models	Jun 9, 2024	Benchmarking	CodeCode Available	1	5
An Extended Benchmarking of Multi-Agent Reinforcement Learning Algorithms in Complex Fully Cooperative Tasks	Feb 7, 2025	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1	5
LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation	Nov 4, 2024	BenchmarkingGraph Generation	CodeCode Available	1	5
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1	5

Show:10 25 50

← PrevPage 18 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified