Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2751–2775 of 5548 papers

Title	Date	Tasks	Status
Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check	Jun 4, 2024	BenchmarkingRepresentation Learning	—Unverified
BIAS: Transparent reporting of biomedical image analysis challenges	Oct 9, 2019	Benchmarking	—Unverified
Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey	Jul 14, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Genicious: Contextual Few-shot Prompting for Insights Discovery	Mar 15, 2025	BenchmarkingDecision Making	—Unverified
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking	Nov 20, 2024	BenchmarkingLanguage Modeling	—Unverified
Beyond Uniform Lipschitz Condition in Differentially Private Optimization	Jun 21, 2022	Benchmarkingregression	—Unverified
Writing as a testbed for open ended agents	Mar 25, 2025	BenchmarkingDiversity	—Unverified
GenSpace: Benchmarking Spatially-Aware Image Generation	May 30, 2025	BenchmarkingImage Generation	—Unverified
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks	Sep 29, 2024	Benchmarking	—Unverified
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models	Jun 7, 2024	BenchmarkingDenoising	—Unverified
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis	Feb 13, 2025	Benchmarking	—Unverified
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing	Jan 20, 2025	BenchmarkingEvolutionary Algorithms	—Unverified
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data	Oct 7, 2023	Benchmarking	—Unverified
Energy Models for Better Pseudo-Labels: Improving Semi-Supervised Classification with the 1-Laplacian Graph Energy	Jun 20, 2019	BenchmarkingMulti-class Classification	—Unverified
GeoGebra Tools with Proof Capabilities	Mar 3, 2016	Automated Theorem ProvingBenchmarking	—Unverified
Language Models as a Service: Overview of a New Paradigm and its Challenges	Sep 28, 2023	Benchmarking	—Unverified
Geometric feature performance under downsampling for EEG classification tasks	Feb 15, 2021	BenchmarkingClassification	—Unverified
Geometry-Based Next Frame Prediction from Monocular Video	Sep 20, 2016	Autonomous DrivingBenchmarking	—Unverified
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries	Dec 31, 2024	BenchmarkingOut-of-Distribution Generalization	—Unverified
GeoNet: Benchmarking Unsupervised Adaptation across Geographies	Mar 27, 2023	BenchmarkingDomain Adaptation	—Unverified
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals	May 30, 2025	BenchmarkingEarth Observation	—Unverified
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy	Jul 25, 2024	Benchmarking	—Unverified
Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages	May 12, 2022	BenchmarkingDiversity	—Unverified
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages	May 26, 2025	BenchmarkingTransliteration	—Unverified
GFPNet: A Deep Network for Learning Shape Completion in Generic Fitted Primitives	Jun 3, 2020	BenchmarkingObject	—Unverified

Show:10 25 50

← PrevPage 111 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified