SOTAVerified

Benchmarking

Papers

Showing 42264250 of 5548 papers

TitleStatusHype
CodeS: Towards Code Model Generalization Under Distribution ShiftCode0
SAIBench: Benchmarking AI for Science0
Functional Code Building Genetic Programming0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization0
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance0
MorisienMT: A Dataset for Mauritian Creole Machine Translation0
Which models are innately best at uncertainty estimation?0
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning RatesCode0
Evaluation of Three Welsh Language POS Taggers0
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts0
Deep One-Class Hate Speech Detection Model0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French0
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking0
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction0
MTLens: Machine Translation Output Debugging0
Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems0
NEWTS: A Corpus for News Topic-Focused Summarization0
bsnsing: A decision tree induction method based on recursive optimal boolean rule compositionCode0
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark SuiteCode0
Benchmarking Unsupervised Anomaly Detection and Localization0
A Framework for Generating Informative Benchmark InstancesCode0
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
Benchmarking of Deep Learning models on 2D Laminar Flow behind Cylinder0
Large Language Models are Few-Shot Clinical Information Extractors0
Show:102550
← PrevPage 170 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified