Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5151–5200 of 5548 papers

Title	Date	Tasks	Status
On Using Distribution-Based Compositionality Assessment to Evaluate Compositional Generalisation in Machine Translation	Nov 14, 2023	BenchmarkingMachine Translation	CodeCode Available
Are Large Language Models Good at Utility Judgments?	Mar 28, 2024	Answer GenerationBenchmarking	CodeCode Available
Benchmarking Language-agnostic Intent Classification for Virtual Assistant Platforms	Jul 1, 2022	BenchmarkingClassification	CodeCode Available
Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability	Feb 18, 2020	BenchmarkingFederated Learning	CodeCode Available
VitaGraph: Building a Knowledge Graph for Biologically Relevant Learning Tasks	May 16, 2025	BenchmarkingLink Prediction	CodeCode Available
Dissecting Sample Hardness: A Fine-Grained Analysis of Hardness Characterization Methods for Data-Centric AI	Mar 7, 2024	Benchmarking	CodeCode Available
Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions	Aug 2, 2024	Benchmarkingmultimodal interaction	CodeCode Available
DispBench: Benchmarking Disparity Estimation to Synthetic Corruptions	May 8, 2025	Autonomous NavigationBenchmarking	CodeCode Available
OpenBioLink: A benchmarking framework for large-scale biomedical link prediction	Dec 10, 2019	BenchmarkingBIG-bench Machine Learning	CodeCode Available
DispaRisk: Auditing Fairness Through Usable Information	May 20, 2024	BenchmarkingBias Detection	CodeCode Available
A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting	Apr 15, 2024	Benchmarking	CodeCode Available
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction	Jun 20, 2023	BenchmarkingDocument-level Relation Extraction	CodeCode Available
Large Scale Clustering with Variational EM for Gaussian Mixture Models	Oct 1, 2018	BenchmarkingClustering	CodeCode Available
AI Sound Recognition on Asthma Medication Adherence: Evaluation With the RDA Benchmark Suite	Feb 8, 2023	BenchmarkingManagement	CodeCode Available
Dialogue Quality and Emotion Annotations for Customer Support Conversations	Nov 23, 2023	BenchmarkingDiversity	CodeCode Available
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking	May 16, 2025	Benchmarking	CodeCode Available
OpenDenoising: an Extensible Benchmark for Building Comparative Studies of Image Denoisers	Oct 18, 2019	BenchmarkingDenoising	CodeCode Available
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression	Oct 27, 2023	BenchmarkingGPU	CodeCode Available
Towards Better Open-Ended Text Generation: A Multicriteria Evaluation Framework	Oct 24, 2024	BenchmarkingDiversity	CodeCode Available
Towards Biologically Plausible and Private Gene Expression Data Generation	Feb 7, 2024	Benchmarking	CodeCode Available
DFEE: Interactive DataFlow Execution and Evaluation Kit	Dec 4, 2022	BenchmarkingScheduling	CodeCode Available
Towards causal benchmarking of bias in face analysis algorithms	Jul 13, 2020	AttributeBenchmarking	CodeCode Available
SORCE: Small Object Retrieval in Complex Environments	May 30, 2025	BenchmarkingImage Retrieval	CodeCode Available
Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings	Apr 4, 2025	Benchmarking	CodeCode Available
Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks	Sep 12, 2019	Affordance DetectionAffordance Recognition	CodeCode Available
CleanPatrick: A Benchmark for Image Data Cleaning	May 16, 2025	BenchmarkingLabel Error Detection	CodeCode Available
Detecting critical treatment effect bias in small subgroups	Apr 29, 2024	BenchmarkingDecision Making	CodeCode Available
AI-generated Image Quality Assessment in Visual Communication	Dec 20, 2024	BenchmarkingImage Quality Assessment	CodeCode Available
SOSD: A Benchmark for Learned Indexes	Nov 29, 2019	BenchmarkingManagement	CodeCode Available
OpenML Benchmarking Suites	Aug 11, 2017	BenchmarkingBIG-bench Machine Learning	CodeCode Available
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design	Oct 23, 2023	BenchmarkingImage Generation	CodeCode Available
Design and implementation of intelligent packet filtering in IoT microcontroller-based devices	May 30, 2023	Benchmarking	CodeCode Available
OpenOOD: Benchmarking Generalized Out-of-Distribution Detection	Oct 13, 2022	Anomaly DetectionBenchmarking	CodeCode Available
Dermatological Diagnosis Explainability Benchmark for Convolutional Neural Networks	Feb 23, 2023	BenchmarkingMedical Diagnosis	CodeCode Available
Depth Functions for Partial Orders with a Descriptive Analysis of Machine Learning Algorithms	Apr 19, 2023	BenchmarkingDescriptive	CodeCode Available
Delving into Instance-Dependent Label Noise in Graph Data: A Comprehensive Study and Benchmark	Jun 14, 2025	BenchmarkingGraph Learning	CodeCode Available
Towards Efficient and Scalable Training of Differentially Private Deep Learning	Jun 25, 2024	BenchmarkingDeep Learning	CodeCode Available
Benchmarking Label Noise in Instance Segmentation: Spatial Noise Matters	Jun 16, 2024	BenchmarkingInstance Segmentation	CodeCode Available
Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach	May 6, 2025	BenchmarkingEarth Observation	CodeCode Available
Delta-Influence: Unlearning Poisons via Influence Functions	Nov 20, 2024	AttributeBenchmarking	CodeCode Available
Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware	Dec 4, 2018	BenchmarkingCPU	CodeCode Available
Defense-friendly Images in Adversarial Attacks: Dataset and Metrics for Perturbation Difficulty	Nov 5, 2020	Adversarial AttackBenchmarking	CodeCode Available
DefAn: Definitive Answer Dataset for LLMs Hallucination Evaluation	Jun 13, 2024	BenchmarkingHallucination	CodeCode Available
Deep Reinforcement Learning for General Video Game AI	Jun 6, 2018	Atari GamesBenchmarking	CodeCode Available
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding	Nov 7, 2023	3D ReconstructionBenchmarking	CodeCode Available
Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications	Jul 20, 2022	Benchmarking	CodeCode Available
DeepOBS: A Deep Learning Optimizer Benchmark Suite	Mar 13, 2019	BenchmarkingDeep Learning	CodeCode Available
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available
OptIForest: Optimal Isolation Forest for Anomaly Detection	Jun 22, 2023	Anomaly DetectionBenchmarking	CodeCode Available
Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset	May 24, 2025	BenchmarkingRAG	CodeCode Available

Show:10 25 50

← PrevPage 104 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified