SOTAVerified

Benchmarking

Papers

Showing 33263350 of 5548 papers

TitleStatusHype
MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics0
Match Stereo Videos via Bidirectional Alignment0
MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities0
(N,K)-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model0
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors0
Matrix-Free Preconditioning in Online Learning0
Maximum Categorical Cross Entropy (MCCE): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks (CNNs) by reducing overfitting0
MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors0
MBA-VO: Motion Blur Aware Visual Odometry0
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model0
MCL-3D: a database for stereoscopic image quality assessment using 2D-image-plus-depth source0
MCUBench: A Benchmark of Tiny Object Detectors on MCUs0
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification0
MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control0
Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language0
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models0
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models0
Measuring Large Language Models Capacity to Annotate Journalistic Sourcing0
Measuring the Complexity of Domains Used to Evaluate AI Systems0
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models0
MEBench: Benchmarking Large Language Models for Cross-Document Multi-Entity Question Answering0
MechProNet: Machine Learning Prediction of Mechanical Properties in Metal Additive Manufacturing0
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models0
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale0
Show:102550
← PrevPage 134 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified