SOTAVerified

Benchmarking

Papers

Showing 44514475 of 5548 papers

TitleStatusHype
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative RefinementCode0
JExplore: Design Space Exploration Tool for Nvidia Jetson BoardsCode0
Anchor Points: Benchmarking Models with Much Fewer ExamplesCode0
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?Code0
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language ModelsCode0
JATE 2.0: Java Automatic Term Extraction with Apache SolrCode0
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language ModelsCode0
Calibrated Adaptive Probabilistic ODE SolversCode0
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMsCode0
Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial ObservationsCode0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithmsCode0
An Auditing Test To Detect Behavioral Shift in Language ModelsCode0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methodsCode0
Learnability and Complexity of Quantum SamplesCode0
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural NetworksCode0
Learned Sorted Table Search and Static Indexes in Small Model SpaceCode0
Learn How to Query from Unlabeled Data Streams in Federated LearningCode0
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by IntegrationCode0
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual TrackingCode0
Learning an Event Sequence Embedding for Dense Event-Based Deep StereoCode0
Adjusting Pretrained Backbones for PerformativityCode0
Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence ConstraintsCode0
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-RiskCode0
Show:102550
← PrevPage 179 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified