Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2751–2800 of 5548 papers

Title	Date	Tasks	Status
Bi-DCSpell: A Bi-directional Detector-Corrector Interactive Framework for Chinese Spelling Check	Jun 4, 2024	BenchmarkingRepresentation Learning	—Unverified
BIAS: Transparent reporting of biomedical image analysis challenges	Oct 9, 2019	Benchmarking	—Unverified
Bias Mitigation for Machine Learning Classifiers: A Comprehensive Survey	Jul 14, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Genicious: Contextual Few-shot Prompting for Insights Discovery	Mar 15, 2025	BenchmarkingDecision Making	—Unverified
Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking	Nov 20, 2024	BenchmarkingLanguage Modeling	—Unverified
Beyond Uniform Lipschitz Condition in Differentially Private Optimization	Jun 21, 2022	Benchmarkingregression	—Unverified
Writing as a testbed for open ended agents	Mar 25, 2025	BenchmarkingDiversity	—Unverified
GenSpace: Benchmarking Spatially-Aware Image Generation	May 30, 2025	BenchmarkingImage Generation	—Unverified
GenTel-Safe: A Unified Benchmark and Shielding Framework for Defending Against Prompt Injection Attacks	Sep 29, 2024	Benchmarking	—Unverified
GenzIQA: Generalized Image Quality Assessment using Prompt-Guided Latent Diffusion Models	Jun 7, 2024	BenchmarkingDenoising	—Unverified
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis	Feb 13, 2025	Benchmarking	—Unverified
Beyond the Hype: Benchmarking LLM-Evolved Heuristics for Bin Packing	Jan 20, 2025	BenchmarkingEvolutionary Algorithms	—Unverified
Beyond Text: A Deep Dive into Large Language Models' Ability on Understanding Graph Data	Oct 7, 2023	Benchmarking	—Unverified
Energy Models for Better Pseudo-Labels: Improving Semi-Supervised Classification with the 1-Laplacian Graph Energy	Jun 20, 2019	BenchmarkingMulti-class Classification	—Unverified
GeoGebra Tools with Proof Capabilities	Mar 3, 2016	Automated Theorem ProvingBenchmarking	—Unverified
Language Models as a Service: Overview of a New Paradigm and its Challenges	Sep 28, 2023	Benchmarking	—Unverified
Geometric feature performance under downsampling for EEG classification tasks	Feb 15, 2021	BenchmarkingClassification	—Unverified
Geometry-Based Next Frame Prediction from Monocular Video	Sep 20, 2016	Autonomous DrivingBenchmarking	—Unverified
Geometry Matters: Benchmarking Scientific ML Approaches for Flow Prediction around Complex Geometries	Dec 31, 2024	BenchmarkingOut-of-Distribution Generalization	—Unverified
GeoNet: Benchmarking Unsupervised Adaptation across Geographies	Mar 27, 2023	BenchmarkingDomain Adaptation	—Unverified
Geospatial Foundation Models to Enable Progress on Sustainable Development Goals	May 30, 2025	BenchmarkingEarth Observation	—Unverified
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy	Jul 25, 2024	Benchmarking	—Unverified
Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages	May 12, 2022	BenchmarkingDiversity	—Unverified
Beyond Specialization: Benchmarking LLMs for Transliteration of Indian Languages	May 26, 2025	BenchmarkingTransliteration	—Unverified
GFPNet: A Deep Network for Learning Shape Completion in Generic Fitted Primitives	Jun 3, 2020	BenchmarkingObject	—Unverified
A Hong Kong Sign Language Corpus Collected from Sign-interpreted TV News	May 2, 2024	BenchmarkingSign Language Recognition	—Unverified
GiCCS: A German in-Context Conversational Similarity Benchmark	Dec 16, 2022	BenchmarkingSemantic Textual Similarity	—Unverified
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking	Feb 19, 2025	Benchmarking	—Unverified
GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra	Jun 9, 2025	3D ReconstructionBenchmarking	—Unverified
Beyond Single-Model Views for Deep Learning: Optimization versus Generalizability of Stochastic Optimization Algorithms	Mar 1, 2024	BenchmarkingStochastic Optimization	—Unverified
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems	Feb 20, 2025	BenchmarkingDecision Making	—Unverified
The Benchmark Lottery	Jul 14, 2021	BenchmarkingBIG-bench Machine Learning	—Unverified
Global Rice Multi-Class Segmentation Dataset (RiceSEG): A Comprehensive and Diverse High-Resolution RGB-Annotated Images for the Development and Benchmarking of Rice Segmentation Algorithms	Apr 2, 2025	BenchmarkingSemantic Segmentation	—Unverified
Global Wheat Head Dataset 2021: more diversity to improve the benchmarking of wheat head localization methods	May 17, 2021	BenchmarkingDiversity	—Unverified
Beyond Monocular Deraining: Stereo Image Deraining via Semantic Understanding	Aug 1, 2020	BenchmarkingRain Removal	—Unverified
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation	May 17, 2025	Benchmarking	—Unverified
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System	Apr 5, 2024	BenchmarkingGPU	—Unverified
A Benchmark for Multi-speaker Anonymization	Jul 8, 2024	BenchmarkingDisentanglement	—Unverified
Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior	May 9, 2021	BenchmarkingRain Removal	—Unverified
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks	Jul 29, 2024	BenchmarkingLanguage Model Evaluation	—Unverified
GNUMAP: A Parameter-Free Approach to Unsupervised Dimensionality Reduction via Graph Neural Networks	Jul 30, 2024	BenchmarkingContrastive Learning	—Unverified
Goal-Driven Sequential Data Abstraction	Jul 29, 2019	BenchmarkingGeneral Reinforcement Learning	—Unverified
A Holistic Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation	Mar 11, 2024	BenchmarkingTraffic Signal Control	—Unverified
Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)	Jun 15, 2020	BenchmarkingDomain Adaptation	—Unverified
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding	Jul 1, 2022	Benchmarking	—Unverified
The Brain Tumor Segmentation (BraTS-METS) Challenge 2023: Brain Metastasis Segmentation on Pre-treatment MRI	Jun 1, 2023	BenchmarkingBrain Tumor Segmentation	—Unverified
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models	Apr 10, 2024	BenchmarkingDenoising	—Unverified
GreenPCO: An Unsupervised Lightweight Point Cloud Odometry Method	Dec 8, 2021	BenchmarkingObject	—Unverified
Ahead-of-Time P-Tuning	May 18, 2023	Benchmarkingparameter-efficient fine-tuning	—Unverified
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding	Jan 16, 2022	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 56 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified