Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–950 of 5548 papers

Title	Date	Tasks	Status	Hype
An Image Dataset for Benchmarking Recommender Systems with Raw Pixels	Sep 13, 2023	BenchmarkingRecommendation Systems	CodeCode Available	1
LEAF: A Benchmark for Federated Settings	Dec 3, 2018	Autonomous VehiclesBenchmarking	CodeCode Available	1
CombiBench: Benchmarking LLM Capability for Combinatorial Mathematics	May 6, 2025	Benchmarking	CodeCode Available	1
animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics	Jun 3, 2024	Audio ClassificationBenchmarking	CodeCode Available	1
AD-LLM: Benchmarking Large Language Models for Anomaly Detection	Dec 15, 2024	Anomaly DetectionBenchmarking	CodeCode Available	1
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets	May 7, 2024	BenchmarkingCancer Classification	CodeCode Available	1
An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models	Mar 15, 2024	BenchmarkingDrug Discovery	CodeCode Available	1
Benchmarking Counterfactual Image Generation	Mar 29, 2024	BenchmarkingConditional Image Generation	CodeCode Available	1
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials	Nov 29, 2022	Benchmarking	CodeCode Available	1
Less Is More: A Comparison of Active Learning Strategies for 3D Medical Image Segmentation	Jul 2, 2022	Active LearningBenchmarking	CodeCode Available	1
Combinatorial Optimization with Policy Adaptation using Latent Space Search	Nov 13, 2023	BenchmarkingCombinatorial Optimization	CodeCode Available	1
Benchmarking Data-driven Surrogate Simulators for Artificial Electromagnetic Materials	Nov 6, 2021	BenchmarkingNeural Network simulation	CodeCode Available	1
A Survey of Pathology Foundation Model: Progress and Future Directions	Apr 5, 2025	BenchmarkingMultiple Instance Learning	CodeCode Available	1
A Comprehensive Benchmark for RNA 3D Structure-Function Modeling	Mar 27, 2025	BenchmarkingDeep Learning	CodeCode Available	1
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation	Apr 30, 2025	3D Molecule GenerationBenchmarking	CodeCode Available	1
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking	Jul 3, 2024	BenchmarkingObject	CodeCode Available	1
Benchmarking Data Science Agents	Feb 27, 2024	BenchmarkingCode Generation	CodeCode Available	1
Light Field Salient Object Detection: A Review and Benchmark	Oct 10, 2020	BenchmarkingObject	CodeCode Available	1
CoDEx: A Comprehensive Knowledge Graph Completion Benchmark	Sep 16, 2020	BenchmarkingKnowledge Graph Completion	CodeCode Available	1
LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment	Oct 28, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care	Sep 16, 2022	BenchmarkingDeep Learning	CodeCode Available	1
MC-Blur: A Comprehensive Benchmark for Image Deblurring	Dec 1, 2021	BenchmarkingDeblurring	CodeCode Available	1
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM	Nov 26, 2024	BenchmarkingText-to-Video Generation	CodeCode Available	1
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates	Jul 8, 2024	Benchmarkingknowledge editing	CodeCode Available	1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19	Feb 9, 2021	BenchmarkingQ-Learning	CodeCode Available	1
Benchmarking deep inverse models over time, and the neural-adjoint method	Sep 27, 2020	Benchmarking	CodeCode Available	1
A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification	Nov 28, 2022	Benchmarkingimage-classification	CodeCode Available	1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit	Sep 7, 2022	Benchmarking	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Oct 13, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1
Benchmarking Deep Learning Interpretability in Time Series Predictions	Oct 26, 2020	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT	Jul 9, 2021	BenchmarkingDocument Classification	CodeCode Available	1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency	Apr 24, 2025	BenchmarkingMath	CodeCode Available	1
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1
Benchmarking Deep Models for Salient Object Detection	Feb 7, 2022	BenchmarkingObject	CodeCode Available	1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness	Mar 24, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	1
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments	Oct 18, 2024	Autonomous NavigationBenchmarking	CodeCode Available	1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks	Nov 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration	Nov 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
Coarse-to-Fine Q-attention with Learned Path Ranking	Apr 4, 2022	Benchmarking	CodeCode Available	1
High-Dimensional Inference in Bayesian Networks	Dec 16, 2021	BenchmarkingVocal Bursts Intensity Prediction	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics	Aug 2, 2024	Adversarial AttackAdversarial Purification	CodeCode Available	1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform	Oct 12, 2021	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 19 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified