SOTAVerified

Benchmarking

Papers

Showing 33013350 of 5548 papers

TitleStatusHype
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers0
Optimizing with Low Budgets: a Comparison on the Black-box Optimization Benchmarking Suite and OpenAI Gym0
Low-Density 3D Point Cloud Classification0
Low Dynamic Range for RIS-aided Bistatic Integrated Sensing and Communication0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French0
LSTM-based Whisper Detection0
LucidDreaming: Controllable Object-Centric 3D Generation0
LUND-PROBE -- LUND Prostate Radiotherapy Open Benchmarking and Evaluation dataset0
M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes0
MA-BBOB: A Problem Generator for Black-Box Optimization Using Affine Combinations and Shifts0
MA-BBOB: Many-Affine Combinations of BBOB Functions for Evaluating AutoML Approaches in Noiseless Numerical Black-Box Optimization Contexts0
Machine Generated Product Advertisements: Benchmarking LLMs Against Human Performance0
Machine Learning-Based Analysis of ECG and PCG Signals for Rheumatic Heart Disease Detection: A Scoping Review (2015-2025)0
Machine Learning for Identifying Grain Boundaries in Scanning Electron Microscopy (SEM) Images of Nanoparticle Superlattices0
Machine learning for modelling unstructured grid data in computational physics: a review0
Machine Learning for Ranking f-wave Extraction Methods in Single-Lead ECGs0
Uncertainty estimation of machine learning spatial precipitation predictions from satellite data0
Machine Vision based Sample-Tube Localization for Mars Sample Return0
Making Sense of Data in the Wild: Data Analysis Automation at Scale0
OrionBench: Benchmarking Time Series Generative Models in the Service of the End-User0
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation0
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects0
Manual Verbalizer Enrichment for Few-Shot Text Classification0
Mapping global dynamics of benchmark creation and saturation in artificial intelligence0
Mapping Violence: Developing an Extensive Framework to Build a Bangla Sectarian Expression Dataset from Social Media Interactions0
MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics0
Match Stereo Videos via Bidirectional Alignment0
MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities0
(N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model0
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors0
Matrix-Free Preconditioning in Online Learning0
Maximum Categorical Cross Entropy (MCCE): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks (CNNs) by reducing overfitting0
MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors0
MBA-VO: Motion Blur Aware Visual Odometry0
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model0
MCL-3D: a database for stereoscopic image quality assessment using 2D-image-plus-depth source0
MCUBench: A Benchmark of Tiny Object Detectors on MCUs0
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification0
MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control0
Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language0
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models0
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models0
Measuring Large Language Models Capacity to Annotate Journalistic Sourcing0
Measuring the Complexity of Domains Used to Evaluate AI Systems0
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models0
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering0
MechProNet: Machine Learning Prediction of Mechanical Properties in Metal Additive Manufacturing0
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models0
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale0
Show:102550
← PrevPage 67 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified