Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–725 of 5548 papers

Title	Date	Tasks	Status	Hype
Attention, Please! Revisiting Attentive Probing for Masked Image Modeling	Jun 11, 2025	BenchmarkingComputational Efficiency	CodeCode Available	1
AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery	Oct 31, 2024	BenchmarkingCloud Removal	CodeCode Available	1
CheX-GPT: Harnessing Large Language Models for Enhanced Chest X-ray Report Labeling	Jan 21, 2024	Benchmarking	CodeCode Available	1
CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness	Jul 13, 2020	Benchmarking	CodeCode Available	1
Automatic sleep stage classification with deep residual networks in a mixed-cohort setting	Aug 21, 2020	Automatic Sleep Stage ClassificationBenchmarking	CodeCode Available	1
Towards Reliable Detection of LLM-Generated Texts: A Comprehensive Evaluation Framework with CUDRT	Jun 13, 2024	BenchmarkingLLM-generated Text Detection	CodeCode Available	1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models	Jan 2, 2025	BenchmarkingComputer Security	CodeCode Available	1
D2S: Document-to-Slide Generation Via Query-Based Text Summarization	May 8, 2021	BenchmarkingLong Form Question Answering	CodeCode Available	1
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic Writing	Jun 7, 2023	BenchmarkingPrompt Engineering	CodeCode Available	1
Autonomous Microscopy Experiments through Large Language Model Agents	Dec 18, 2024	BenchmarkingExperimental Design	CodeCode Available	1
A Comprehensive Study on Large-Scale Graph Training: Benchmarking and Rethinking	Oct 14, 2022	BenchmarkingGPU	CodeCode Available	1
CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning	Feb 20, 2024	Atomic number classificationBenchmarking	CodeCode Available	1
A Ladder of Causal Distances	May 5, 2020	BenchmarkingCausal Discovery	CodeCode Available	1
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging	Apr 30, 2024	BenchmarkingImage Reconstruction	CodeCode Available	1
A Critical Assessment of State-of-the-Art in Entity Alignment	Oct 30, 2020	BenchmarkingEntity Alignment	CodeCode Available	1
DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation	Oct 11, 2022	6D Pose Estimation6D Pose Estimation using RGB	CodeCode Available	1
Atom-Level Optical Chemical Structure Recognition with Limited Supervision	Apr 2, 2024	Benchmarking	CodeCode Available	1
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing	Apr 27, 2021	BenchmarkingRetrieval	CodeCode Available	1
Benchmarking Adversarial Patch Against Aerial Detection	Oct 30, 2022	Benchmarking	CodeCode Available	1
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin	Jul 15, 2024	Benchmarking	CodeCode Available	1
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation	Nov 10, 2023	BenchmarkingCloud Computing	CodeCode Available	1
Deep learning model solves change point detection for multiple change types	Apr 15, 2022	BenchmarkingChange Point Detection	CodeCode Available	1
ALTO: A Large-Scale Dataset for UAV Visual Place Recognition and Localization	Jul 19, 2022	BenchmarkingImage Registration	CodeCode Available	1
Benchmarking and Analyzing Point Cloud Classification under Corruptions	Feb 7, 2022	BenchmarkingClassification	CodeCode Available	1
CCTV-Gun: Benchmarking Handgun Detection in CCTV Images	Mar 19, 2023	Benchmarkingobject-detection	CodeCode Available	1

Show:10 25 50

← PrevPage 29 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified