Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3451–3475 of 5548 papers

Title	Date	Tasks	Status
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection	Jun 5, 2024	Anomaly DetectionBenchmarking	—Unverified
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems	Jul 26, 2024	Benchmarking	—Unverified
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts	Dec 18, 2023	Benchmarking	—Unverified
MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts	Jun 18, 2023	AutoMLBenchmarking	—Unverified
Towards an AI Accountability Policy	Jul 25, 2023	BenchmarkingFairness	—Unverified
Machine Generated Product Advertisements: Benchmarking LLMs Against Human Performance	Dec 27, 2024	BenchmarkingPersuasiveness	—Unverified
Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations	Jul 17, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video	Mar 6, 2024	BenchmarkingCrowd Counting	—Unverified
Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025)	May 17, 2025	BenchmarkingDiagnostic	—Unverified
Towards a Taxonomy of Graph Learning Datasets	Oct 27, 2021	BenchmarkingGraph Learning	—Unverified
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices	Jan 7, 2025	BenchmarkingClustering	—Unverified
Machine learning for modelling unstructured grid data in computational physics: a review	Feb 13, 2025	Benchmarking	—Unverified
Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling (1+λ) EA Variants on OneMax and LeadingOnes	Aug 17, 2018	BenchmarkingEvolutionary Algorithms	—Unverified
Machine Learning for Ranking f-wave Extraction Methods in Single-Lead ECGs	Jul 17, 2023	Benchmarking	—Unverified
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving	Aug 19, 2024	BenchmarkingMachine Translation	—Unverified
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data	Nov 13, 2023	BenchmarkingFeature Importance	—Unverified
Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction	Dec 12, 2024	BenchmarkingDiversity	—Unverified
Benchmarking LLMs and SLMs for patient reported outcomes	Dec 20, 2024	BenchmarkingPrivacy Preserving	—Unverified
Benchmarking LLM powered Chatbots: Methods and Metrics	Aug 8, 2023	BenchmarkingChatbot	—Unverified
Machine Vision based Sample-Tube Localization for Mars Sample Return	Mar 17, 2021	BenchmarkingTemplate Matching	—Unverified
Benchmarking LLM Guardrails in Handling Multilingual Toxicity	Oct 29, 2024	Benchmarking	—Unverified
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3	Apr 22, 2025	BenchmarkingLanguage Modeling	—Unverified
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins	Apr 4, 2025	Benchmarking	—Unverified
Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios	Mar 31, 2025	Adversarial AttackAutonomous Driving	—Unverified
Making Sense of Data in the Wild: Data Analysis Automation at Scale	Jan 27, 2025	BenchmarkingDiversity	—Unverified

Show:10 25 50

← PrevPage 139 of 222Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified