Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 851–875 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Mar 8, 2024	Action RecognitionBenchmarking	CodeCode Available	1	5
AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets	May 7, 2024	BenchmarkingCancer Classification	CodeCode Available	1	5
A Closer Look at Mortality Risk Prediction from Electrocardiograms	Jun 24, 2024	BenchmarkingPrediction	CodeCode Available	1	5
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?	Jun 15, 2023	Autonomous DrivingAutonomous Vehicles	CodeCode Available	1	5
A Survey of Pathology Foundation Model: Progress and Future Directions	Apr 5, 2025	BenchmarkingMultiple Instance Learning	CodeCode Available	1	5
CharacterBench: Benchmarking Character Customization of Large Language Models	Dec 16, 2024	Benchmarking	CodeCode Available	1	5
An Empirical Study on Google Research Football Multi-agent Scenarios	May 16, 2023	BenchmarkingMulti-agent Reinforcement Learning	CodeCode Available	1	5
A Comprehensive Benchmark for RNA 3D Structure-Function Modeling	Mar 27, 2025	BenchmarkingDeep Learning	CodeCode Available	1	5
IOHanalyzer: Detailed Performance Analyses for Iterative Optimization Heuristics	Jul 8, 2020	Bayesian OptimizationBenchmarking	CodeCode Available	1	5
GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation	Apr 30, 2025	3D Molecule GenerationBenchmarking	CodeCode Available	1	5
EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search	Nov 24, 2021	BenchmarkingNeural Architecture Search	CodeCode Available	1	5
IOHprofiler: A Benchmarking and Profiling Tool for Iterative Optimization Heuristics	Oct 11, 2018	Benchmarking	CodeCode Available	1	5
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1	5
End-to-end Knowledge Retrieval with Multi-modal Queries	Jun 1, 2023	BenchmarkingCross-Modal Retrieval	CodeCode Available	1	5
An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction	Sep 4, 2019	BenchmarkingGeneral Classification	CodeCode Available	1	5
Benchmarking Batch Deep Reinforcement Learning Algorithms	Oct 3, 2019	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1	5
Benchmarking machine learning models on multi-centre eICU critical care dataset	Oct 2, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available	1	5
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1	5
Ego-Body Pose Estimation via Ego-Head Pose Estimation	Dec 9, 2022	BenchmarkingDisentanglement	CodeCode Available	1	5
CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods	Aug 2, 2022	BenchmarkingCausal Discovery	CodeCode Available	1	5
A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care	Sep 16, 2022	BenchmarkingDeep Learning	CodeCode Available	1	5
AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMM	Nov 26, 2024	BenchmarkingText-to-Video Generation	CodeCode Available	1	5
JaxRobotarium: Training and Deploying Multi-Robot Policies in 10 Minutes	May 10, 2025	BenchmarkingGPU	CodeCode Available	1	5
Job-SDF: A Multi-Granularity Dataset for Job Skill Demand Forecasting and Benchmarking	Jun 17, 2024	BenchmarkingDemand Forecasting	CodeCode Available	1	5
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1	5

Show:10 25 50

← PrevPage 35 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified