Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 926–950 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking deep inverse models over time, and the neural-adjoint method	Sep 27, 2020	Benchmarking	CodeCode Available	1
A Call to Reflect on Evaluation Practices for Failure Detection in Image Classification	Nov 28, 2022	Benchmarkingimage-classification	CodeCode Available	1
Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit	Sep 7, 2022	Benchmarking	CodeCode Available	1
Collab-Overcooked: Benchmarking and Evaluating Large Language Models as Collaborative Agents	Feb 27, 2025	Benchmarking	CodeCode Available	1
LoLI-Street: Benchmarking Low-Light Image Enhancement and Beyond	Oct 13, 2024	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1
CODEMENV: Benchmarking Large Language Models on Code Migration	Jun 1, 2025	Benchmarking	CodeCode Available	1
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation	Feb 26, 2025	BenchmarkingCode Generation	CodeCode Available	1
CodeReef: an open platform for portable MLOps, reusable automation actions and reproducible benchmarking	Jan 22, 2020	Benchmarkingobject-detection	CodeCode Available	1
Benchmarking Deep Learning Interpretability in Time Series Predictions	Oct 26, 2020	BenchmarkingDeep Learning	CodeCode Available	1
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT	Jul 9, 2021	BenchmarkingDocument Classification	CodeCode Available	1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency	Apr 24, 2025	BenchmarkingMath	CodeCode Available	1
CodeS: Natural Language to Code Repository via Multi-Layer Sketch	Mar 25, 2024	Benchmarking	CodeCode Available	1
Benchmarking Deep Models for Salient Object Detection	Feb 7, 2022	BenchmarkingObject	CodeCode Available	1
Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustness	Mar 24, 2025	BenchmarkingSemantic Segmentation	CodeCode Available	1
New Protocols and Negative Results for Textual Entailment Data Collection	Apr 24, 2020	BenchmarkingDiversity	CodeCode Available	1
Benchmarking Deep Reinforcement Learning for Navigation in Denied Sensor Environments	Oct 18, 2024	Autonomous NavigationBenchmarking	CodeCode Available	1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and Tasks	Nov 25, 2024	Benchmarkingobject-detection	CodeCode Available	1
CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization	Apr 6, 2025	BenchmarkingCombinatorial Optimization	CodeCode Available	1
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration	Nov 14, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1
Coarse-to-Fine Q-attention with Learned Path Ranking	Apr 4, 2022	Benchmarking	CodeCode Available	1
High-Dimensional Inference in Bayesian Networks	Dec 16, 2021	BenchmarkingVocal Bursts Intensity Prediction	CodeCode Available	1
COCO: The Large Scale Black-Box Optimization Benchmarking (bbob-largescale) Test Suite	Mar 15, 2019	Benchmarking	CodeCode Available	1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
Guardians of Image Quality: Benchmarking Defenses Against Adversarial Attacks on Image Quality Metrics	Aug 2, 2024	Adversarial AttackAdversarial Purification	CodeCode Available	1
Codabench: Flexible, Easy-to-Use and Reproducible Benchmarking Platform	Oct 12, 2021	Benchmarking	CodeCode Available	1

Show:10 25 50

← PrevPage 38 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified