Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 776–800 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments	May 8, 2025	BenchmarkingPrompt Engineering	CodeCode Available	1	5
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests	May 15, 2025	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1	5
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow	May 23, 2025	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning	Dec 11, 2024	AttributeBenchmarking	CodeCode Available	1	5
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset	Nov 5, 2024	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Generalizable deep learning for photoplethysmography-based blood pressure estimation -- A Benchmarking Study	Feb 26, 2025	BenchmarkingBlood pressure estimation	CodeCode Available	1	5
Generating a Doppelganger Graph: Resembling but Distinct	Jan 23, 2021	BenchmarkingGraph Representation Learning	CodeCode Available	1	5
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1	5
Best practices for constructing, preparing, and evaluating protein-ligand binding affinity benchmarks	May 13, 2021	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmark on Drug Target Interaction Modeling from a Structure Perspective	Jul 4, 2024	BenchmarkingDrug Discovery	CodeCode Available	1	5
Benchmarking Actor-Critic Deep Reinforcement Learning Algorithms for Robotics Control with Action Constraints	Apr 18, 2023	BenchmarkingDeep Reinforcement Learning	CodeCode Available	1	5
DocuMint: Docstring Generation for Python using Small Language Models	May 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Does BERT Learn as Humans Perceive? Understanding Linguistic Styles through Lexica	Sep 6, 2021	Benchmarking	CodeCode Available	1	5
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset	Jun 5, 2023	BenchmarkingMultiple-choice	CodeCode Available	1	5
BEND: Benchmarking DNA Language Models on biologically meaningful tasks	Nov 21, 2023	BenchmarkingLanguage Modeling	CodeCode Available	1	5
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1	5
Benchmarking Adversarial Patch Against Aerial Detection	Oct 30, 2022	Benchmarking	CodeCode Available	1	5
dMelodies: A Music Dataset for Disentanglement Learning	Jul 29, 2020	BenchmarkingDisentanglement	CodeCode Available	1	5
GeoBenchX: Benchmarking LLMs for Multistep Geospatial Tasks	Mar 23, 2025	BenchmarkingHallucination	CodeCode Available	1	5
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models	Jul 16, 2024	BenchmarkingCode Generation	CodeCode Available	1	5
Benchmarking Adversarial Robustness on Image Classification	Jun 1, 2020	Adversarial AttackAdversarial Robustness	CodeCode Available	1	5
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
GLGENN: A Novel Parameter-Light Equivariant Neural Networks Architecture Based on Clifford Geometric Algebras	Jun 11, 2025	Benchmarking	CodeCode Available	1	5
DNN+NeuroSim V2.0: An End-to-End Benchmarking Framework for Compute-in-Memory Accelerators for On-chip Training	Mar 13, 2020	BenchmarkingQuantization	CodeCode Available	1	5
Does your model understand genes? A benchmark of gene properties for biological and text models	Dec 5, 2024	BenchmarkingMulti-class Classification	CodeCode Available	1	5

Show:10 25 50

← PrevPage 32 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified