Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3451–3500 of 5548 papers

Title	Date	Tasks	Status
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection	Jun 5, 2024	Anomaly DetectionBenchmarking	—Unverified
Towards a Multidimensional Evaluation Framework for Empathetic Conversational Systems	Jul 26, 2024	Benchmarking	—Unverified
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts	Dec 18, 2023	Benchmarking	—Unverified
MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts	Jun 18, 2023	AutoMLBenchmarking	—Unverified
Towards an AI Accountability Policy	Jul 25, 2023	BenchmarkingFairness	—Unverified
Machine Generated Product Advertisements: Benchmarking LLMs Against Human Performance	Dec 27, 2024	BenchmarkingPersuasiveness	—Unverified
Towards an Automated SOAP Note: Classifying Utterances from Medical Conversations	Jul 17, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Density-Guided Temporal Attention Transformer for Indiscernible Object Counting in Underwater Video	Mar 6, 2024	BenchmarkingCrowd Counting	—Unverified
Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025)	May 17, 2025	BenchmarkingDiagnostic	—Unverified
Towards a Taxonomy of Graph Learning Datasets	Oct 27, 2021	BenchmarkingGraph Learning	—Unverified
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices	Jan 7, 2025	BenchmarkingClustering	—Unverified
Machine learning for modelling unstructured grid data in computational physics: a review	Feb 13, 2025	Benchmarking	—Unverified
Towards a Theory-Guided Benchmarking Suite for Discrete Black-Box Optimization Heuristics: Profiling (1+λ) EA Variants on OneMax and LeadingOnes	Aug 17, 2018	BenchmarkingEvolutionary Algorithms	—Unverified
Machine Learning for Ranking f-wave Extraction Methods in Single-Lead ECGs	Jul 17, 2023	Benchmarking	—Unverified
Large Language Models for Classical Chinese Poetry Translation: Benchmarking, Evaluating, and Improving	Aug 19, 2024	BenchmarkingMachine Translation	—Unverified
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data	Nov 13, 2023	BenchmarkingFeature Importance	—Unverified
Benchmarking LLMs for Mimicking Child-Caregiver Language in Interaction	Dec 12, 2024	BenchmarkingDiversity	—Unverified
Benchmarking LLMs and SLMs for patient reported outcomes	Dec 20, 2024	BenchmarkingPrivacy Preserving	—Unverified
Benchmarking LLM powered Chatbots: Methods and Metrics	Aug 8, 2023	BenchmarkingChatbot	—Unverified
Machine Vision based Sample-Tube Localization for Mars Sample Return	Mar 17, 2021	BenchmarkingTemplate Matching	—Unverified
Benchmarking LLM Guardrails in Handling Multilingual Toxicity	Oct 29, 2024	Benchmarking	—Unverified
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3	Apr 22, 2025	BenchmarkingLanguage Modeling	—Unverified
Towards a Unified Framework for Determining Conformational Ensembles of Disordered Proteins	Apr 4, 2025	Benchmarking	—Unverified
Towards Benchmarking and Assessing the Safety and Robustness of Autonomous Driving on Safety-critical Scenarios	Mar 31, 2025	Adversarial AttackAutonomous Driving	—Unverified
Making Sense of Data in the Wild: Data Analysis Automation at Scale	Jan 27, 2025	BenchmarkingDiversity	—Unverified
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User	Oct 26, 2023	Anomaly DetectionBenchmarking	—Unverified
A Deep Q-Learning Method for Downlink Power Allocation in Multi-Cell Networks	Apr 30, 2019	BenchmarkingDeep Reinforcement Learning	—Unverified
Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages	Sep 1, 2024	BenchmarkingCode Generation	—Unverified
Benchmarking LiDAR Sensors for Development and Evaluation of Automotive Perception	Apr 28, 2020	BenchmarkingSystematic Literature Review	—Unverified
Towards Benchmarking and Evaluating Deepfake Detection	Mar 4, 2022	BenchmarkingDeepFake Detection	—Unverified
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation	May 14, 2025	BenchmarkingDeformable Object Manipulation	—Unverified
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects	Dec 6, 2024	2kAnomaly Detection	—Unverified
Deep Patent Landscaping Model Using Transformer and Graph Embedding	Mar 14, 2019	BenchmarkingGraph Embedding	—Unverified
Manual Verbalizer Enrichment for Few-Shot Text Classification	Oct 8, 2024	BenchmarkingClassification	—Unverified
Towards Benchmarking Explainable Artificial Intelligence Methods	Aug 25, 2022	BenchmarkingExplainable artificial intelligence	—Unverified
Mapping global dynamics of benchmark creation and saturation in artificial intelligence	Mar 9, 2022	Benchmarking	—Unverified
Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions	Apr 17, 2024	Benchmarking	—Unverified
Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR	Nov 9, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Towards Benchmarking Scene Background Initialization	Jun 12, 2015	Benchmarking	—Unverified
MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics	Mar 12, 2025	BenchmarkingGPU	—Unverified
Benchmarking Lexical Simplification Systems	May 1, 2016	BenchmarkingLexical Simplification	—Unverified
Towards Benchmarking the Utility of Explanations for Model Debugging	May 10, 2021	Benchmarking	—Unverified
WER We Stand: Benchmarking Urdu ASR Models	Sep 17, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Benchmarking Learnt Radio Localisation under Distribution Shift	Oct 4, 2022	Benchmarking	—Unverified
Benchmarking learned non-Cartesian k-space trajectories and reconstruction networks	Jan 27, 2022	Benchmarking	—Unverified
Match Stereo Videos via Bidirectional Alignment	Sep 30, 2024	BenchmarkingStereo Matching	—Unverified
MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities	Aug 5, 2024	BenchmarkingGraph Generation	—Unverified
PINNs for Medical Image Analysis: A Survey	Aug 2, 2024	AnatomyBenchmarking	—Unverified
(N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model	Mar 11, 2024	BenchmarkingLanguage Modeling	—Unverified
Benchmarking learned algorithms for computed tomography image reconstruction tasks	Dec 11, 2024	BenchmarkingComputed Tomography (CT)	—Unverified

Show:10 25 50

← PrevPage 70 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified