SOTAVerified

Benchmarking

Papers

Showing 45014525 of 5548 papers

TitleStatusHype
Beyond MD17: the reactive xxMD datasetCode0
The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in RCode0
Learning to Transfer for Traffic Forecasting via Multi-task LearningCode0
IOLBENCH: Benchmarking LLMs on Linguistic ReasoningCode0
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot InteractionsCode0
Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data ImbalanceCode0
BEARD: Benchmarking the Adversarial Robustness for Dataset DistillationCode0
RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim VerificationCode0
Inverse Contextual Bandits: Learning How Behavior Evolves over TimeCode0
UCFE: A User-Centric Financial Expertise Benchmark for Large Language ModelsCode0
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAMCode0
INTERSPEECH 2009 Emotion Challenge Revisited: Benchmarking 15 Years of Progress in Speech Emotion RecognitionCode0
Integration of nested cross-validation, automated hyperparameter optimization, high-performance computing to reduce and quantify the variance of test performance estimation of deep learning modelsCode0
BdSLW60: A Word-Level Bangla Sign Language DatasetCode0
The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models CollapseCode0
Integrating Expert Knowledge into Logical Programs via LLMsCode0
The CaLiGraph Ontology as a Challenge for OWL ReasonersCode0
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine LearningCode0
Strong and Simple Baselines for Multimodal Utterance EmbeddingsCode0
InstaIndoor and Multi-modal Deep Learning for Indoor Scene RecognitionCode0
The Collective Knowledge project: making ML models more portable and reproducible with open APIs, reusable best practices and MLOpsCode0
a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verificationCode0
Resource Interoperability for Sustainable Benchmarking: The Case of EventsCode0
Bayesian Neural Networks with Soft EvidenceCode0
BASED: Benchmarking, Analysis, and Structural Estimation of DeblurringCode0
Show:102550
← PrevPage 181 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified