Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1951–2000 of 5548 papers

Title	Date	Tasks	Status
Challenges in Benchmarking Stream Learning Algorithms with Real-world Data	Apr 30, 2020	Benchmarking	—Unverified
Challenges and Pitfalls of Machine Learning Evaluation and Benchmarking	Apr 29, 2019	BenchmarkingBIG-bench Machine Learning	—Unverified
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition	Nov 22, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation	Dec 15, 2024	3D GenerationBenchmarking	—Unverified
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset	Oct 1, 2024	BenchmarkingContrastive Learning	—Unverified
Challenges and perspectives in computational deconvolution of genomics data	Nov 21, 2022	Benchmarking	—Unverified
CzechLynx: A Dataset for Individual Identification and Pose Estimation of the Eurasian Lynx	Jun 5, 2025	2D Pose EstimationBenchmarking	—Unverified
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors	Sep 29, 2023	BenchmarkingComputational Efficiency	—Unverified
AN ELIXIR FOR BLOCKCHAIN SCALABILITY WITH CHANNEL BASED CLUSTERED SHARDING	Dec 20, 2023	Benchmarking	—Unverified
DACOS-A Manually Annotated Dataset of Code Smells	Mar 15, 2023	Benchmarking	—Unverified
DACSA: A large-scale Dataset for Automatic summarization of Catalan and Spanish newspaper Articles	Jul 1, 2022	Abstractive Text SummarizationArticles	—Unverified
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes	May 22, 2025	BenchmarkingRAG	—Unverified
Challenges and Advancements in Modeling Shock Fronts with Physics-Informed Neural Networks: A Review and Benchmarking Study	Mar 14, 2025	Benchmarking	—Unverified
Danish Airs and Grounds: A Dataset for Aerial-to-Street-Level Place Recognition and Localization	Feb 3, 2022	3D ReconstructionBenchmarking	—Unverified
DarkBench: Benchmarking Dark Patterns in Large Language Models	Mar 13, 2025	Benchmarking	—Unverified
DASB -- Discrete Audio and Speech Benchmark	Jun 20, 2024	BenchmarkingEmotion Recognition	—Unverified
Data Analysis in the Era of Generative AI	Sep 27, 2024	Benchmarking	—Unverified
Data and its (dis)contents: A survey of dataset development and use in machine learning research	Dec 9, 2020	BenchmarkingBIG-bench Machine Learning	—Unverified
Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory	Aug 24, 2024	BenchmarkingData Augmentation	—Unverified
Data Augmentation for Traffic Classification	Jan 19, 2024	BenchmarkingClassification	—Unverified
Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset	Apr 16, 2024	BenchmarkingManagement	—Unverified
Data-driven Approach for Static Hedging of Exchange Traded Options	Feb 1, 2023	BenchmarkingInterpretable Machine Learning	—Unverified
Challenge Results Are Not Reproducible	Jul 14, 2023	BenchmarkingImage Segmentation	—Unverified
Data-driven inventory management for new products: An adjusted Dyna-Q approach with transfer learning	Jan 14, 2025	BenchmarkingManagement	—Unverified
A Dataset Similarity Evaluation Framework for Wireless Communications and Sensing	Dec 7, 2024	BenchmarkingDimensionality Reduction	—Unverified
Data-driven surrogate modelling and benchmarking for process equipment	Mar 13, 2020	Active LearningBenchmarking	—Unverified
Data-Driven Target Localization: Benchmarking Gradient Descent Using the Cramer-Rao Bound	Jan 20, 2024	Benchmarking	—Unverified
Benchmarking Federated Machine Unlearning methods for Tabular Data	Apr 1, 2025	BenchmarkingComputational Efficiency	—Unverified
ChakmaNMT: A Low-resource Machine Translation On Chakma Language	Oct 14, 2024	BenchmarkingMachine Translation	—Unverified
Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning	Jan 8, 2024	BenchmarkingCoLA	—Unverified
Audio Turing Test: Benchmarking the Human-likeness of Large Language Model-based Text-to-Speech Systems in Chinese	May 16, 2025	BenchmarkingLanguage Modeling	—Unverified
End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings	Jun 19, 2018	Benchmarking	—Unverified
C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System	Dec 17, 2024	BenchmarkingRAG	—Unverified
CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking	Jun 4, 2025	BenchmarkingCode Generation	—Unverified
Benchmarking and Improving Generator-Validator Consistency of Language Models	Oct 3, 2023	BenchmarkingInstruction Following	—Unverified
Certifying almost all quantum states with few single-qubit measurements	Apr 10, 2024	AllBenchmarking	—Unverified
A Platform for Event Extraction in Hindi	May 1, 2020	ArticlesBenchmarking	—Unverified
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition	Jun 11, 2024	BenchmarkingCross-corpus	—Unverified
DBsurf: A Discrepancy Based Method for Discrete Stochastic Gradient Estimation	Sep 7, 2023	BenchmarkingNeural Architecture Search	—Unverified
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines	Dec 1, 2021	Adversarial RobustnessBenchmarking	—Unverified
An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks	Sep 3, 2019	Benchmarkingspeech-recognition	—Unverified
DDR-ID: Dual Deep Reconstruction Networks Based Image Decomposition for Anomaly Detection	Jul 18, 2020	Adversarial AttackAdversarial Attack Detection	—Unverified
CellCycleGAN: Spatiotemporal Microscopy Image Synthesis of Cell Populations using Statistical Shape Models and Conditional GANs	Oct 22, 2020	BenchmarkingCell Segmentation	—Unverified
DeAR: Debiasing Vision-Language Models with Additive Residuals	Mar 18, 2023	AttributeBenchmarking	—Unverified
CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark	Jul 1, 2019	BenchmarkingObject Tracking	—Unverified
DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis	May 20, 2025	BenchmarkingFairness	—Unverified
An efficiency analysis of Spanish airports	Nov 8, 2023	Benchmarking	—Unverified
Decentralized Federated Learning on the Edge over Wireless Mesh Networks	Nov 2, 2023	BenchmarkingFederated Learning	—Unverified
1-D Convlutional Neural Networks for the Analysis of Pupil Size Variations in Scotopic Conditions	Feb 6, 2020	BenchmarkingBinary Classification	—Unverified
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption	Feb 17, 2025	BenchmarkingCode Summarization	—Unverified

Show:10 25 50

← PrevPage 40 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified