SOTAVerified

Benchmarking

Papers

Showing 44514500 of 5548 papers

TitleStatusHype
Large-scale Ridesharing DARP Instances Based on Real Travel DemandCode0
Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative RefinementCode0
JExplore: Design Space Exploration Tool for Nvidia Jetson BoardsCode0
Anchor Points: Benchmarking Models with Much Fewer ExamplesCode0
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?Code0
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language ModelsCode0
JATE 2.0: Java Automatic Term Extraction with Apache SolrCode0
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language ModelsCode0
Calibrated Adaptive Probabilistic ODE SolversCode0
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMsCode0
Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial ObservationsCode0
DyKnow: Dynamically Verifying Time-Sensitive Factual Knowledge in LLMsCode0
AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithmsCode0
An Auditing Test To Detect Behavioral Shift in Language ModelsCode0
Leak Proof CMap; a framework for training and evaluation of cell line agnostic L1000 similarity methodsCode0
Learnability and Complexity of Quantum SamplesCode0
Learned Bayesian Cramér-Rao Bound for Unknown Measurement Models Using Score Neural NetworksCode0
Learned Sorted Table Search and Static Indexes in Small Model SpaceCode0
Learn How to Query from Unlabeled Data Streams in Federated LearningCode0
Reinvestigating the R2 Indicator: Achieving Pareto Compliance by IntegrationCode0
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual TrackingCode0
Learning an Event Sequence Embedding for Dense Event-Based Deep StereoCode0
Adjusting Pretrained Backbones for PerformativityCode0
Cable Tree Wiring -- Benchmarking Solvers on a Real-World Scheduling Problem with a Variety of Precedence ConstraintsCode0
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-RiskCode0
REMM:Rotation-Equivariant Framework for End-to-End Multimodal Image MatchingCode0
Learning collective multi-cellular dynamics from temporal scRNA-seq via a transformer-enhanced Neural SDECode0
Using representation balancing to learn conditional-average dose responses from clustered dataCode0
Beemo: Benchmark of Expert-edited Machine-generated OutputsCode0
B-XAIC Dataset: Benchmarking Explainable AI for Graph Neural Networks Using Chemical DataCode0
Building Conformal Prediction Intervals with Approximate Message PassingCode0
Learning Dynamic Selection and Pricing of Out-of-Home DeliveriesCode0
UAV Trajectory Planning for Data Collection from Time-Constrained IoT DevicesCode0
Learning from Integral Losses in Physics Informed Neural NetworksCode0
Removing Geometric Bias in One-Class Anomaly Detection with Adaptive Feature PerturbationCode0
The Arcade Learning Environment: An Evaluation Platform for General AgentsCode0
Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spottingCode0
Learning protein constitutive motifs from sequence dataCode0
Learning Quantum Processes with Quantum Statistical QueriesCode0
ISImed: A Framework for Self-Supervised Learning using Intrinsic Spatial Information in Medical ImagesCode0
UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice QuestionsCode0
BED: Bi-Encoder-Based Detectors for Out-of-Distribution DetectionCode0
Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in IndonesiaCode0
RUHSNet: 3D Object Detection Using Lidar Data in Real TimeCode0
Replication Study and Benchmarking of Real-Time Object Detection ModelsCode0
IPC: A Benchmark Data Set for Learning with Graph-Structured DataCode0
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference ContentCode0
Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The BenchmarkCode0
IoT Data Trust Evaluation via Machine LearningCode0
Representation Learning of Limit Order Book: A Comprehensive Study and BenchmarkingCode0
Show:102550
← PrevPage 90 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified