SOTAVerified

Benchmarking

Papers

Showing 16261650 of 5548 papers

TitleStatusHype
Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training (TXIT) Exam and Red Journal Gray Zone Cases: Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation OncologyCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-RiskCode0
Knowledge Enhanced Conditional Imputation for Healthcare Time-seriesCode0
LABCAT: Locally adaptive Bayesian optimization using principal-component-aligned trust regionsCode0
KhabarChin: Automatic Detection of Important News in the Persian LanguageCode0
A New Cervical Cytology Dataset for Nucleus Detection and Image Classification (Cervix93) and Methods for Cervical Nucleus DetectionCode0
KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-ZenCode0
A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papersCode0
KArSL: Arabic Sign Language DatabaseCode0
A Biologically Plausible Benchmark for Contextual Bandit Algorithms in Precision Oncology Using in vitro DataCode0
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question AnsweringCode0
Joint Multi-Scale Tone Mapping and Denoising for HDR Image EnhancementCode0
JExplore: Design Space Exploration Tool for Nvidia Jetson BoardsCode0
A Neuro-Symbolic Framework for Sequence Classification with Relational and Temporal KnowledgeCode0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
LANTERN: A Machine Learning Framework for Lipid Nanoparticle Transfection Efficiency PredictionCode0
Benchmarking AutoML algorithms on a collection of synthetic classification problemsCode0
A Neuromorphic Dataset for Object Segmentation in Indoor Cluttered EnvironmentCode0
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMsCode0
A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and InterpretabilityCode0
Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large pCode0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
Benchmarking a transformer-FREE model for ad-hoc retrievalCode0
Show:102550
← PrevPage 66 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified