SOTAVerified

Benchmarking

Papers

Showing 35013525 of 5548 papers

TitleStatusHype
Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat0
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors0
Matrix-Free Preconditioning in Online Learning0
Benchmarking Large Language Model Volatility0
Benchmarking Large Language Models with Integer Sequence Generation Tasks0
Maximum Categorical Cross Entropy (MCCE): A noise-robust alternative loss function to mitigate racial bias in Convolutional Neural Networks (CNNs) by reducing overfitting0
MaxpoolNMS: Getting Rid of NMS Bottlenecks in Two-Stage Object Detectors0
Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting0
MBA-VO: Motion Blur Aware Visual Odometry0
Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds0
MCDFN: Supply Chain Demand Forecasting via an Explainable Multi-Channel Data Fusion Network Model0
MCL-3D: a database for stereoscopic image quality assessment using 2D-image-plus-depth source0
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction0
MCUBench: A Benchmark of Tiny Object Detectors on MCUs0
MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification0
MDR-DeePC: Model-Inspired Distributionally Robust Data-Enabled Predictive Control0
Benchmarking Large Language Models via Random Variables0
Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language0
Measuring CLEVRness: Black-box Testing of Visual Reasoning Models0
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models0
Measuring Large Language Models Capacity to Annotate Journalistic Sourcing0
Measuring the Complexity of Domains Used to Evaluate AI Systems0
Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models0
Towards Effective Disambiguation for Machine Translation with Large Language Models0
Show:102550
← PrevPage 141 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified