Benchmarking

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 3301–3350 of 5548 papers

Title	Date	Tasks	Status
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers	Apr 19, 2025	BenchmarkingDiagnostic	—Unverified
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym	Sep 29, 2023	Bayesian OptimizationBenchmarking	—Unverified
Low-Density 3D Point Cloud Classification	Oct 30, 2024	3D Point Cloud ClassificationAutonomous Driving	—Unverified
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication	Nov 9, 2024	BenchmarkingIntegrated sensing and communication	—Unverified
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French	Jun 1, 2022	BenchmarkingLow Resource Neural Machine Translation	—Unverified
LSTM-based Whisper Detection	Sep 20, 2018	Benchmarking	—Unverified
LucidDreaming: Controllable Object-Centric 3D Generation	Nov 30, 2023	3D GenerationBenchmarking	—Unverified
LUND-PROBE -- LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset	Feb 6, 2025	BenchmarkingComputed Tomography (CT)	—Unverified
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes	Oct 9, 2024	BenchmarkingMotion Generation	—Unverified
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts	Dec 18, 2023	Benchmarking	—Unverified
MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts	Jun 18, 2023	AutoMLBenchmarking	—Unverified
Machine Generated Product Advertisements: Benchmarking LLMs Against Human Performance	Dec 27, 2024	BenchmarkingPersuasiveness	—Unverified
Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025)	May 17, 2025	BenchmarkingDiagnostic	—Unverified
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices	Jan 7, 2025	BenchmarkingClustering	—Unverified
Machine learning for modelling unstructured grid data in computational physics: a review	Feb 13, 2025	Benchmarking	—Unverified
Machine Learning for Ranking f-wave Extraction Methods in Single-Lead ECGs	Jul 17, 2023	Benchmarking	—Unverified
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data	Nov 13, 2023	BenchmarkingFeature Importance	—Unverified
Machine Vision based Sample-Tube Localization for Mars Sample Return	Mar 17, 2021	BenchmarkingTemplate Matching	—Unverified
Making Sense of Data in the Wild: Data Analysis Automation at Scale	Jan 27, 2025	BenchmarkingDiversity	—Unverified
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User	Oct 26, 2023	Anomaly DetectionBenchmarking	—Unverified
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation	May 14, 2025	BenchmarkingDeformable Object Manipulation	—Unverified
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects	Dec 6, 2024	2kAnomaly Detection	—Unverified
Manual Verbalizer Enrichment for Few-Shot Text Classification	Oct 8, 2024	BenchmarkingClassification	—Unverified
Mapping global dynamics of benchmark creation and saturation in artificial intelligence	Mar 9, 2022	Benchmarking	—Unverified
Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions	Apr 17, 2024	Benchmarking	—Unverified
MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics	Mar 12, 2025	BenchmarkingGPU	—Unverified
Match Stereo Videos via Bidirectional Alignment	Sep 30, 2024	BenchmarkingStereo Matching	—Unverified
MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities	Aug 5, 2024	BenchmarkingGraph Generation	—Unverified
(N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model	Mar 11, 2024	BenchmarkingLanguage Modeling	—Unverified
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations	Feb 10, 2025	BenchmarkingIn-Context Learning	—Unverified
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors	Feb 26, 2025	Benchmarking	—Unverified
Matrix-Free Preconditioning in Online Learning	May 29, 2019	Benchmarking	—Unverified
Maximum Categorical Cross Entropy (MCCE): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks (CNNs) by reducing overfitting	Jan 1, 2021	BenchmarkingGeneral Classification	—Unverified
MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors	Jun 1, 2019	BenchmarkingGeneral Classification	—Unverified
MBA-VO: Motion Blur Aware Visual Odometry	Mar 25, 2021	BenchmarkingVisual Odometry	—Unverified
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model	May 24, 2024	BenchmarkingDemand Forecasting	—Unverified
MCL-3D: a database for stereoscopic image quality assessment using 2D-image-plus-depth source	Mar 23, 2014	BenchmarkingImage Quality Assessment	—Unverified
MCUBench: A Benchmark of Tiny Object Detectors on MCUs	Sep 27, 2024	BenchmarkingModel Selection	—Unverified
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification	May 29, 2024	Benchmarking	—Unverified
MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control	Jun 24, 2025	Benchmarking	—Unverified
Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language	Jun 25, 2024	Benchmarking	—Unverified
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models	Sep 29, 2021	BenchmarkingDiagnostic	—Unverified
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models	Feb 24, 2022	BenchmarkingDiagnostic	—Unverified
Measuring Large Language Models Capacity to Annotate Journalistic Sourcing	Dec 30, 2024	BenchmarkingEthics	—Unverified
Measuring the Complexity of Domains Used to Evaluate AI Systems	Sep 18, 2020	Benchmarking	—Unverified
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models	Aug 21, 2023	Adversarial RobustnessBenchmarking	—Unverified
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering	Feb 26, 2025	BenchmarkingQuestion Answering	—Unverified
MechProNet: Machine Learning Prediction of Mechanical Properties in Metal Additive Manufacturing	Aug 21, 2022	ArticlesBenchmarking	—Unverified
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models	May 22, 2025	BenchmarkingLanguage Modeling	—Unverified
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale	Jun 4, 2025	BenchmarkingLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 67 of 111Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 Turbo	ACC	0.56	—	Unverified