SOTAVerified

Benchmarking

Papers

Showing 28262850 of 5548 papers

TitleStatusHype
Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting0
RDBench: ML Benchmark for Relational Databases0
ConDefects: A New Dataset to Address the Data Leakage Concern for LLM-based Fault Localization and Program Repair0
XFEVER: Exploring Fact Verification across LanguagesCode0
MLFMF: Data Sets for Machine Learning for Mathematical FormalizationCode1
BLESS: Benchmarking Large Language Models on Sentence SimplificationCode0
CRoW: Benchmarking Commonsense Reasoning in Real-World TasksCode1
Analyzing Multilingual Competency of LLMs in Multi-Turn Instruction Following: A Case Study of Arabic0
DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual DesignCode0
XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series ClassificationCode0
A Quantitative Evaluation of Dense 3D Reconstruction of Sinus Anatomy from Monocular Endoscopic Video0
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation0
Fast hyperboloid decision tree algorithmsCode1
Benchmarking and Improving Text-to-SQL Generation under AmbiguityCode0
Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language ModelsCode0
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection BenchmarkCode1
Standardised workflow for mass spectrometry-based single-cell proteomics data processing and analysis using the scp package0
Benchmarking GPUs on SVBRDF Extractor Model0
Almost Equivariance via Lie Algebra Convolutions0
OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution ShiftCode1
Formalizing and Benchmarking Prompt Injection Attacks and DefensesCode2
FactCHD: Benchmarking Fact-Conflicting Hallucination DetectionCode1
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot InteractionsCode0
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For NowCode1
Object-aware Inversion and Reassembly for Image EditingCode1
Show:102550
← PrevPage 114 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified