Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 801–825 of 5548 papers

Title	Date	Tasks	Status	Hype	Score
BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning	Feb 23, 2025	Benchmarking	CodeCode Available	1	5
BLADE: Benchmarking Language Model Agents for Data-Driven Science	Aug 19, 2024	BenchmarkingDecision Making	CodeCode Available	1	5
Ego-Body Pose Estimation via Ego-Head Pose Estimation	Dec 9, 2022	BenchmarkingDisentanglement	CodeCode Available	1	5
EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography	Oct 31, 2024	BenchmarkingElectromyography (EMG)	CodeCode Available	1	5
Benchmarking AI scientists in omics data-driven biological research	May 13, 2025	BenchmarkingMultiple-choice	CodeCode Available	1	5
HazeSpace2M: A Dataset for Haze Aware Single Image Dehazing	Sep 25, 2024	BenchmarkingImage Dehazing	CodeCode Available	1	5
ENRICH: Multi-purposE dataset for beNchmaRking In Computer vision and pHotogrammetry	Apr 1, 2023	3D Reconstruction3D Scene Reconstruction	CodeCode Available	1	5
Benchmarking Algorithms for Federated Domain Generalization	Jul 11, 2023	BenchmarkingDiversity	CodeCode Available	1	5
Benchmarking Algorithms for Submodular Optimization Problems Using IOHProfiler	Feb 2, 2023	BenchmarkingEvolutionary Algorithms	CodeCode Available	1	5
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning	Feb 3, 2024	BenchmarkingDeepFake Detection	CodeCode Available	1	5
A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs	Mar 10, 2020	BenchmarkingEntity Alignment	CodeCode Available	1	5
4D Panoptic LiDAR Segmentation	Feb 24, 2021	4D Panoptic SegmentationBenchmarking	CodeCode Available	1	5
Benchmarking and Analysis of Unsupervised Object Segmentation from Real-world Single Images	Dec 8, 2023	BenchmarkingObject	CodeCode Available	1	5
Benchmarking and Analyzing 3D-aware Image Synthesis with a Modularized Codebase	Jun 21, 2023	3D-Aware Image SynthesisBenchmarking	CodeCode Available	1	5
Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition	Sep 25, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Benchmarking LLMs for Political Science: A United Nations Perspective	Feb 19, 2025	BenchmarkingDecision Making	CodeCode Available	1	5
B-Pref: Benchmarking Preference-Based Reinforcement Learning	Nov 4, 2021	Benchmarkingreinforcement-learning	CodeCode Available	1	5
Benchmarking and Analyzing Point Cloud Classification under Corruptions	Feb 7, 2022	BenchmarkingClassification	CodeCode Available	1	5
Benchmarking LLMs' Swarm intelligence	May 7, 2025	Benchmarking	CodeCode Available	1	5
Benchmarking Low-Shot Robustness to Natural Distribution Shifts	Apr 21, 2023	Benchmarking	CodeCode Available	1	5
Benchmarking of DL Libraries and Models on Mobile Devices	Feb 14, 2022	BenchmarkingGPU	CodeCode Available	1	5
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text	Apr 28, 2025	Benchmarking	CodeCode Available	1	5
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling	Jul 13, 2024	BenchmarkingMath	CodeCode Available	1	5
Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks	Apr 5, 2022	Benchmarking	CodeCode Available	1	5
A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation, and Research Challenges	Oct 21, 2022	BenchmarkingCommunity Detection	CodeCode Available	1	5

Show:10 25 50

← PrevPage 33 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified