Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 751–800 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
A Systematic Benchmarking Analysis of Transfer Learning for Medical Image Analysis	Aug 12, 2021	BenchmarkingMedical Image Analysis	CodeCode Available	1	5
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA	Dec 29, 2023	AnatomyBenchmarking	CodeCode Available	1	5
A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models	Aug 2, 2022	BenchmarkingSynthetic Data Generation	CodeCode Available	1	5
BeHonest: Benchmarking Honesty in Large Language Models	Jun 19, 2024	BenchmarkingMisinformation	CodeCode Available	1	5
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
A Comprehensive Overview of Large Language Models	Jul 12, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking TinyML Systems: Challenges and Direction	Mar 10, 2020	BenchmarkingPosition	CodeCode Available	1	5
AirSim Drone Racing Lab	Mar 12, 2020	BenchmarkingOptical Flow Estimation	CodeCode Available	1	5
Bench4KE: Benchmarking Automated Competency Question Generation	May 30, 2025	BenchmarkingQuestion Generation	CodeCode Available	1	5
A SWAT-based Reinforcement Learning Framework for Crop Management	Feb 10, 2023	BenchmarkingDecision Making	CodeCode Available	1	5
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking the Robustness of Deep Neural Networks to Common Corruptions in Digital Pathology	Jun 30, 2022	BenchmarkingDiagnostic	CodeCode Available	1	5
Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection	May 30, 2022	3D Object DetectionAutonomous Driving	CodeCode Available	1	5
Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization	May 27, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking LLMs for Political Science: A United Nations Perspective	Feb 19, 2025	BenchmarkingDecision Making	CodeCode Available	1	5
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models	Dec 5, 2023	BenchmarkingVisual Question Answering	CodeCode Available	1	5
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1	5
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?	Apr 29, 2024	Answer GenerationBenchmarking	CodeCode Available	1	5
Anabranch Network for Camouflaged Object Segmentation	May 20, 2021	BenchmarkingCamouflaged Object Segmentation	CodeCode Available	1	5
G4SATBench: Benchmarking and Advancing SAT Solving with Graph Neural Networks	Sep 29, 2023	Benchmarking	CodeCode Available	1	5
GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection	Jul 16, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking tree species classification from proximally-sensed laser scanning data: introducing the FOR-species20K dataset	Aug 12, 2024	Benchmarking	CodeCode Available	1	5
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regime	May 28, 2024	BenchmarkingReinforcement Learning (RL)	CodeCode Available	1	5
Ego-Body Pose Estimation via Ego-Head Pose Estimation	Dec 9, 2022	BenchmarkingDisentanglement	CodeCode Available	1	5
Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks	Nov 4, 2024	Action GenerationBenchmarking	CodeCode Available	1	5
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments	May 8, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1	5
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests	May 15, 2025	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1	5
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow	May 23, 2025	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Dec 11, 2024	AttributeBenchmarking	CodeCode Available	1	5
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset	Nov 5, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study	Feb 26, 2025	BenchmarkingBlood pressure estimation	CodeCode Available	1	5
Generating a Doppelganger Graph: Resembling but Distinct	Jan 23, 2021	BenchmarkingGraph Representation Learning	CodeCode Available	1	5
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1	5
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks	May 13, 2021	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmark on Drug Target Interaction Modeling from a Structure Perspective	Jul 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints	Apr 18, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1	5
DocuMint: Docstring Generation for Python using Small Language Models	May 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica	Sep 6, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1	5
BEND: Benchmarking DNA Language Models on biologically meaningful tasks	Nov 21, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5
Benchmarking Adversarial Patch Against Aerial Detection	Oct 30, 2022	Benchmarking	CodeCode Available	1	5
dMelodies: A Music Dataset for Disentanglement Learning	Jul 29, 2020	BenchmarkingDisentanglement	CodeCode Available	1	5
GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks	Mar 23, 2025	BenchmarkingHallucination	CodeCode Available	1	5
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Adversarial Robustness on Image Classification	Jun 1, 2020	Adversarial AttackAdversarial Robustness	CodeCode Available	1	5
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras	Jun 11, 2025	Benchmarking	CodeCode Available	1	5
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training	Mar 13, 2020	BenchmarkingQuantization	CodeCode Available	1	5
Does your model understand genes? A benchmark of gene properties for biological and text models	Dec 5, 2024	BenchmarkingMulti-class Classification	CodeCode Available	1	5

Show:10 25 50

← PrevPage 16 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified