SOTAVerified

Benchmarking

Papers

Showing 45264550 of 5548 papers

TitleStatusHype
Bugs in the Data: How ImageNet Misrepresents BiodiversityCode0
inMOTIFin: a lightweight end-to-end simulation software for regulatory sequencesCode0
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in EnglishCode0
InDL: A New Dataset and Benchmark for In-Diagram Logic Interpretation based on Visual IllusionCode0
Individual Fairness Guarantees for Neural NetworksCode0
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian ContextCode0
LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization TechniquesCode0
BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow ImagesCode0
bsnsing: A decision tree induction method based on recursive optimal boolean rule compositionCode0
Rethinking Empirical Evaluation of Adversarial Robustness Using First-Order Attack MethodsCode0
Improving the Perturbation-Based Explanation of Deepfake Detectors Through the Use of Adversarially-Generated SamplesCode0
BSBench: will your LLM find the largest prime number?Code0
Light Field Saliency Detection with Deep Convolutional NetworksCode0
Improving Pretrained Models for Zero-shot Multi-label Text Classification through Reinforced Label Hierarchy ReasoningCode0
Bridging the Generalisation Gap: Synthetic Data Generation for Multi-Site Clinical Model ValidationCode0
An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data ScienceCode0
Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNsCode0
On-orbit model training for satellite imagery with label proportionsCode0
LimeSoDa: A Dataset Collection for Benchmarking of Machine Learning Regressors in Digital Soil MappingCode0
Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model ArchitectureCode0
Rethinking the Reference-based Distinctive Image CaptioningCode0
Linear energy storage and flexibility model with ramp rate, ramping, deadline and capacity constraintsCode0
BRI3L: A Brightness Illusion Image Dataset for Identification and Localization of Regions of Illusory PerceptionCode0
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model DiscoveryCode0
BONES: a Benchmark fOr Neural Estimation of Shapley valuesCode0
Show:102550
← PrevPage 182 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified