Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 5101–5150 of 5548 papers

Title	Date	Tasks	Status
SydneyScapes: Image Segmentation for Australian Environments	Apr 10, 2025	Autonomous VehiclesBenchmarking	—Unverified
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts	Sep 22, 2023	ArticlesBenchmarking	—Unverified
Domain Aligned CLIP for Few-shot Classification	Nov 15, 2023	BenchmarkingClassification	—Unverified
ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments	Nov 6, 2015	Anomaly DetectionBenchmarking	—Unverified
Domain Generalization in Computational Pathology: Survey and Guidelines	Oct 30, 2023	BenchmarkingDiagnostic	—Unverified
Comparative Benchmarking of Causal Discovery Techniques	Aug 18, 2017	BenchmarkingCausal Discovery	—Unverified
Comparative Analysis of Packages and Algorithms for the Analysis of Spatially Resolved Transcriptomics Data	Aug 3, 2021	Benchmarking	—Unverified
Comparative analysis of neural network architectures for short-term FOREX forecasting	May 13, 2024	Benchmarking	—Unverified
Don't stack layers in graph neural networks, wire them randomly	Jan 1, 2021	AttributeBenchmarking	—Unverified
Commute Graph Neural Networks	Jun 30, 2024	Benchmarking	—Unverified
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning	Jan 9, 2025	BenchmarkingQuestion Answering	—Unverified
Common Pets in 3D: Dynamic New-View Synthesis of Real-Life Deformable Categories	Nov 7, 2022	3D Reconstruction4D reconstruction	—Unverified
Downsampling and geometric feature methods for EEG classification tasks with CNNs	Oct 10, 2020	BenchmarkingEEG	—Unverified
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration	Jun 17, 2022	BenchmarkingDepth Estimation	—Unverified
Rethinking Coherence Modeling: Synthetic vs. Downstream Tasks	Apr 30, 2020	BenchmarkingCoherence Evaluation	—Unverified
On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates	Jun 13, 2021	BenchmarkingFederated Learning	—Unverified
DPO: A Differential and Pointwise Control Approach to Reinforcement Learning	Apr 24, 2024	Benchmarkingreinforcement-learning	—Unverified
Syn3DWound: A Synthetic Dataset for 3D Wound Bed Analysis	Nov 27, 2023	BenchmarkingDiagnostic	—Unverified
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images	Apr 5, 2023	BenchmarkingData Augmentation	—Unverified
Coherent Feed Forward Quantum Neural Network	Feb 1, 2024	BenchmarkingDiagnostic	—Unverified
Cognitive Model Priors for Predicting Human Decisions	May 22, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models	Feb 21, 2024	Benchmarking	—Unverified
Drift in a Popular Metal Oxide Sensor Dataset Reveals Limitations for Gas Classification Benchmarks	Aug 19, 2021	BenchmarkingClassification	—Unverified
DRIV100: In-The-Wild Multi-Domain Dataset and Evaluation for Real-World Domain Adaptation of Semantic Segmentation	Jan 30, 2021	BenchmarkingDomain Adaptation	—Unverified
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks	Jul 14, 2025	BenchmarkingCode Generation	—Unverified
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data	Oct 6, 2022	BenchmarkingRepresentation Learning	—Unverified
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings	Jan 2, 2025	BenchmarkingCode Generation	—Unverified
DSLOB: A Synthetic Limit Order Book Dataset for Benchmarking Forecasting Algorithms under Distributional Shift	Nov 17, 2022	BenchmarkingTime Series	—Unverified
CodeCrash: Stress Testing LLM Reasoning under Structural and Semantic Perturbations	Apr 19, 2025	Benchmarking	—Unverified
Dual Encoder-Decoder based Generative Adversarial Networks for Disentangled Facial Representation Learning	Sep 19, 2019	BenchmarkingDecoder	—Unverified
Dual Task Framework for Improving Persona-grounded Dialogue Dataset	Feb 11, 2022	Benchmarking	—Unverified
CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance	Jul 14, 2025	BenchmarkingCode Generation	—Unverified
Synplex: A synthetic simulator of highly multiplexed histological images	Mar 8, 2021	Benchmarking	—Unverified
Syntactically Aware Neural Architectures for Definition Extraction	Jun 1, 2018	BenchmarkingBinary Classification	—Unverified
DyFEn: Agent-Based Fee Setting in Payment Channel Networks	Oct 15, 2022	BenchmarkingDeep Reinforcement Learning	—Unverified
Syntax Encoding with Application in Authorship Attribution	Oct 1, 2018	Authorship AttributionBenchmarking	—Unverified
Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking	Nov 30, 2021	BenchmarkingNatural Language Understanding	—Unverified
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking	Jul 1, 2022	BenchmarkingNatural Language Understanding	—Unverified
Dynabench: Rethinking Benchmarking in NLP	Apr 7, 2021	Benchmarking	—Unverified
Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking	May 21, 2021	Benchmarking	—Unverified
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis	Mar 29, 2025	BenchmarkingLarge Language Model	—Unverified
CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems	Oct 2, 2023	BenchmarkingComputational Efficiency	—Unverified
Dynamic benchmarking framework for LLM-based conversational data capture	Feb 4, 2025	Benchmarking	—Unverified
Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views	Feb 23, 2023	Benchmarking	—Unverified
Dynamic Benchmarking of Reasoning Capabilities in Code Large Language Models Under Data Contamination	Mar 6, 2025	Benchmarking	—Unverified
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence	Oct 20, 2024	Benchmarking	—Unverified
Dynamic-KGQA: A Scalable Framework for Generating Adaptive Question Answering Datasets	Mar 6, 2025	BenchmarkingDataset Generation	—Unverified
Dynamic Obstacle Avoidance with Bounded Rationality Adversarial Reinforcement Learning	Mar 14, 2025	BenchmarkingNavigate	—Unverified
Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircraft	Jul 19, 2024	BenchmarkingTransfer Learning	—Unverified
Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures	Aug 22, 2024	BenchmarkingTrajectory Prediction	—Unverified

Show:10 25 50

← PrevPage 103 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified