Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5476–5500 of 5548 papers

Title	Date	Tasks	Status
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases	Mar 6, 2025	BenchmarkingDiagnostic	CodeCode Available
A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus Detection	Nov 23, 2018	BenchmarkingCervical Nucleus Detection	CodeCode Available
ClimRetrieve: A Benchmarking Dataset for Information Retrieval from Corporate Climate Disclosures	Jun 14, 2024	Answer GenerationBenchmarking	CodeCode Available
Benchmarking and Rethinking Knowledge Editing for Large Language Models	May 24, 2025	Benchmarkingknowledge editing	CodeCode Available
CLEAVE: Scalable and Edge-native Benchmarking of Networked Control Systems	Apr 5, 2022	BenchmarkingEdge-computing	CodeCode Available
Quantitative Metrics for Benchmarking Human-Aware Robot Navigation	Jul 26, 2023	BenchmarkingRobot Navigation	CodeCode Available
Benchmarking and optimizing organism wide single-cell RNA alignment methods	Mar 26, 2025	BenchmarkingDecoder	CodeCode Available
XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series Classification	Oct 23, 2023	BenchmarkingTime Series	CodeCode Available
CLDyB: Towards Dynamic Benchmarking for Continual Learning with Pre-trained Models	Mar 6, 2025	BenchmarkingContinual Learning	CodeCode Available
Benchmarking and Improving Text-to-SQL Generation under Ambiguity	Oct 20, 2023	BenchmarkingDiversity	CodeCode Available
Quantum Boosting using Domain-Partitioning Hypotheses	Oct 25, 2021	BenchmarkingEnsemble Learning	CodeCode Available
TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs	May 16, 2025	BenchmarkingQuestion Answering	CodeCode Available
Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation	Apr 5, 2024	AttributeBenchmarking	CodeCode Available
Multi-GPU-Enabled Hybrid Quantum-Classical Workflow in Quantum-HPC Middleware: Applications in Quantum Simulations	Mar 9, 2024	BenchmarkingCPU	CodeCode Available
TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images	Apr 1, 2025	Autonomous NavigationBenchmarking	CodeCode Available
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papers	Nov 6, 2021	BenchmarkingRetinal Vessel Segmentation	CodeCode Available
Adversarial Environment Generation for Learning to Navigate the Web	Mar 2, 2021	BenchmarkingDecision Making	CodeCode Available
A*3D Dataset: Towards Autonomous Driving in Challenging Environments	Sep 17, 2019	3D Object DetectionAutonomous Driving	CodeCode Available
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring	Mar 23, 2024	BenchmarkingText to SQL	CodeCode Available
Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies	Mar 11, 2024	BenchmarkingData Augmentation	CodeCode Available
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample	Jan 28, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
Quaternion Capsule Networks	Jul 8, 2020	BenchmarkingObject Recognition	CodeCode Available
QU-BraTS: MICCAI BraTS 2020 Challenge on Quantifying Uncertainty in Brain Tumor Segmentation - Analysis of Ranking Scores and Benchmarking Results	Dec 19, 2021	BenchmarkingBrain Tumor Segmentation	CodeCode Available
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs	Dec 16, 2024	BenchmarkingCommon Sense Reasoning	CodeCode Available
Question-Answering Dense Video Events	Sep 6, 2024	BenchmarkingQuestion Answering	CodeCode Available

Show:10 25 50

← PrevPage 220 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified