Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1476–1500 of 5548 papers

Title	Date	Tasks	Status	Hype
Benchmarking Graph Neural Networks on Dynamic Link Prediction	Sep 29, 2021	BenchmarkingDynamic Link Prediction	CodeCode Available	1
Benchmarking Graph Neural Networks for FMRI analysis	Nov 16, 2022	Benchmarking	CodeCode Available	1
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs	Jun 22, 2023	Arithmetic ReasoningBenchmarking	CodeCode Available	1
BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation	May 7, 2022	6D Pose EstimationBenchmarking	CodeCode Available	1
ClearPose: Large-scale Transparent Object Dataset and Benchmark	Mar 8, 2022	BenchmarkingDepth Completion	CodeCode Available	1
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text	Apr 28, 2025	Benchmarking	CodeCode Available	1
Performance Evaluation of Deep Transfer Learning on Multiclass Identification of Common Weed Species in Cotton Production Systems	Oct 11, 2021	BenchmarkingManagement	CodeCode Available	1
PGDQN: Preference-Guided Deep Q-Network	Oct 3, 2023	Atari GamesBenchmarking	CodeCode Available	1
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation	Oct 11, 2024	BenchmarkingImage Segmentation	CodeCode Available	1
Beyond neural scaling laws: beating power law scaling via data pruning	Jun 29, 2022	Benchmarking	CodeCode Available	1
Beyond Normal: On the Evaluation of Mutual Information Estimators	Jun 19, 2023	BenchmarkingDomain Generalization	CodeCode Available	1
CySecBench: Generative AI-based CyberSecurity-focused Prompt Dataset for Benchmarking Large Language Models	Jan 2, 2025	BenchmarkingComputer Security	CodeCode Available	1
dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing	Apr 27, 2021	BenchmarkingRetrieval	CodeCode Available	1
PLANTAIN: Diffusion-inspired Pose Score Minimization for Fast and Accurate Molecular Docking	Jul 22, 2023	BenchmarkingMolecular Docking	CodeCode Available	1
Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering	Aug 31, 2023	BenchmarkingDataset Generation	CodeCode Available	1
ECRECer: Enzyme Commission Number Recommendation and Benchmarking based on Multiagent Dual-core Learning	Feb 8, 2022	BenchmarkingLanguage Modelling	CodeCode Available	1
Kvasir-Instrument: Diagnostic and therapeutic tool segmentation dataset in gastrointestinal endoscopy	Oct 23, 2020	BenchmarkingDiagnostic	CodeCode Available	1
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather	Oct 18, 2020	Autonomous DrivingBenchmarking	CodeCode Available	1
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding	Jul 20, 2024	BenchmarkingHeuristic Search	CodeCode Available	1
CLoG: Benchmarking Continual Learning of Image Generation Models	Jun 7, 2024	BenchmarkingContinual Learning	CodeCode Available	1
Positional Encoding in Transformer-Based Time Series Models: A Survey	Feb 17, 2025	Anomaly DetectionBenchmarking	CodeCode Available	1
PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power Systems	Dec 9, 2024	BenchmarkingPrediction	CodeCode Available	1
Benchmarking Graph Learning for Drug-Drug Interaction Prediction	Oct 24, 2024	BenchmarkingGraph Learning	—Unverified	0
A practical generalization metric for deep networks benchmarking	Sep 2, 2024	BenchmarkingDiversity	—Unverified	0
AERF: Adaptive ensemble random fuzzy algorithm for anomaly detection in cloud computing	Jan 9, 2023	Anomaly DetectionBenchmarking	—Unverified	0

Show:10 25 50

← PrevPage 60 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified