Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2901–2950 of 5548 papers

Title	Date	Tasks	Status
DIG: A Turnkey Library for Diving into Graph Deep Learning Research	Mar 23, 2021	BenchmarkingDeep Learning	—Unverified
DiLiGenT102: A Photometric Stereo Benchmark Dataset With Controlled Shape and Material Variation	Jan 1, 2022	Benchmarking	—Unverified
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models	Jun 5, 2025	BenchmarkingDiversity	—Unverified
DiPCo -- Dinner Party Corpus	Sep 30, 2019	Benchmarking	—Unverified
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning	Jun 15, 2023	BenchmarkingConversational Question Answering	—Unverified
Disability prediction in multiple sclerosis using performance outcome measures and demographic data	Apr 8, 2022	BenchmarkingBIG-bench Machine Learning	—Unverified
Disambiguation in Conversational Question Answering in the Era of LLM: A Survey	May 18, 2025	BenchmarkingConversational Question Answering	—Unverified
DISC: a Dataset for Integrated Sensing and Communication in mmWave Systems	Jun 15, 2023	Activity RecognitionBenchmarking	—Unverified
DISCOMAN: Dataset of Indoor SCenes for Odometry, Mapping And Navigation	Sep 26, 2019	BenchmarkingPanoptic Segmentation	—Unverified
Discosuite - A parser test suite for German discontinuous structures	May 1, 2014	BenchmarkingConstituency Parsing	—Unverified
Discovering Visual Concept Structure with Sparse and Incomplete Tags	May 30, 2017	BenchmarkingClustering	—Unverified
Discriminating modelling approaches for Point in Time Economic Scenario Generation	Aug 19, 2021	Benchmarking	—Unverified
Discriminative Link Prediction using Local Links, Node Features and Community Structure	Oct 17, 2013	BenchmarkingClustering	—Unverified
Disentangling coincident cell events using deep transfer learning and compressive sensing	Jul 17, 2025	BenchmarkingCompressive Sensing	—Unverified
DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts	Mar 25, 2024	Benchmarking	—Unverified
DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction	Sep 17, 2021	BenchmarkingRelation	—Unverified
Distortion-adaptive Salient Object Detection in 360^ Omnidirectional Images	Sep 11, 2019	Benchmarkingobject-detection	—Unverified
Distributed Evolution Strategies with Multi-Level Learning for Large-Scale Black-Box Optimization	Oct 9, 2023	Benchmarking	—Unverified
Distributed Software-Defined Network Architecture for Smart Grid Resilience to Denial-of-Service Attacks	Dec 20, 2022	Benchmarking	—Unverified
Distributed Training Large-Scale Deep Architectures	Aug 10, 2017	BenchmarkingDeep Learning	—Unverified
Distribution-Based Invariant Deep Networks for Learning Meta-Features	Jun 24, 2020	BenchmarkingGeneral Classification	—Unverified
Sensitivity analysis and experimental evaluation of PID-like continuous sliding mode control	Aug 13, 2022	BenchmarkingSensitivity	—Unverified
Diverse Community Data for Benchmarking Data Privacy Algorithms	Jun 20, 2023	Benchmarking	—Unverified
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs (Extended)	Nov 18, 2019	BenchmarkingCPU	—Unverified
DLUE: Benchmarking Document Language Understanding	May 16, 2023	BenchmarkingDocument Classification	—Unverified
DNR Bench: Benchmarking Over-Reasoning in Reasoning LLMs	Mar 20, 2025	BenchmarkingHallucination	—Unverified
A Sober Look at the Robustness of CLIPs to Spurious Features	Mar 18, 2024	Benchmarking	—Unverified
Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields	Aug 11, 2023	Benchmarking	—Unverified
Does imputation matter? Benchmark for predictive models	Jul 6, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts	Sep 22, 2023	ArticlesBenchmarking	—Unverified
Domain Aligned CLIP for Few-shot Classification	Nov 15, 2023	BenchmarkingClassification	—Unverified
Domain Generalization in Computational Pathology: Survey and Guidelines	Oct 30, 2023	BenchmarkingDiagnostic	—Unverified
Don't stack layers in graph neural networks, wire them randomly	Jan 1, 2021	AttributeBenchmarking	—Unverified
Downsampling and geometric feature methods for EEG classification tasks with CNNs	Oct 10, 2020	BenchmarkingEEG	—Unverified
On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates	Jun 13, 2021	BenchmarkingFederated Learning	—Unverified
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning	Apr 24, 2024	Benchmarkingreinforcement-learning	—Unverified
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images	Apr 5, 2023	BenchmarkingData Augmentation	—Unverified
Drift in a Popular Metal Oxide Sensor Dataset Reveals Limitations for Gas Classification Benchmarks	Aug 19, 2021	BenchmarkingClassification	—Unverified
DRIV100: In-The-Wild Multi-Domain Dataset and Evaluation for Real-World Domain Adaptation of Semantic Segmentation	Jan 30, 2021	BenchmarkingDomain Adaptation	—Unverified
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift	Nov 17, 2022	BenchmarkingTime Series	—Unverified
Dual Encoder-Decoder based Generative Adversarial Networks for Disentangled Facial Representation Learning	Sep 19, 2019	BenchmarkingDecoder	—Unverified
Dual Task Framework for Improving Persona-grounded Dialogue Dataset	Feb 11, 2022	Benchmarking	—Unverified
DyFEn: Agent-Based Fee Setting in Payment Channel Networks	Oct 15, 2022	BenchmarkingDeep Reinforcement Learning	—Unverified
Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking	Nov 30, 2021	BenchmarkingNatural Language Understanding	—Unverified
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking	Jul 1, 2022	BenchmarkingNatural Language Understanding	—Unverified
Dynabench: Rethinking Benchmarking in NLP	Apr 7, 2021	Benchmarking	—Unverified
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking	May 21, 2021	Benchmarking	—Unverified
Dynamic benchmarking framework for LLM-based conversational data capture	Feb 4, 2025	Benchmarking	—Unverified
Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views	Feb 23, 2023	Benchmarking	—Unverified
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination	Mar 6, 2025	Benchmarking	—Unverified

Show:10 25 50

← PrevPage 59 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified