SOTAVerified

Benchmarking

Papers

Showing 42014250 of 5548 papers

TitleStatusHype
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking0
HATE-ITA: New Baselines for Hate Speech Detection in ItalianCode0
Benchmarking Intersectional Biases in NLPCode0
SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features0
Local manifold learning and its link to domain-based physics knowledgeCode0
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations0
Benchmarking Language-agnostic Intent Classification for Virtual Assistant PlatformsCode0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
Computer-aided diagnosis and prediction in brain disorders0
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes EquationsCode0
Toward an ImageNet Library of Functions for Global Optimization Benchmarking0
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in OmniverseCode0
Beyond Uniform Lipschitz Condition in Differentially Private Optimization0
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed GraphsCode0
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasetsCode0
Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking0
Motley: Benchmarking Heterogeneity and Personalization in Federated LearningCode0
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration0
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation LearningCode0
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case0
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networksCode0
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability0
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models0
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents0
EmProx: Neural Network Performance Estimation For Neural Architecture SearchCode0
CodeS: Towards Code Model Generalization Under Distribution ShiftCode0
SAIBench: Benchmarking AI for Science0
Functional Code Building Genetic Programming0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization0
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance0
MorisienMT: A Dataset for Mauritian Creole Machine Translation0
Which models are innately best at uncertainty estimation?0
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning RatesCode0
Evaluation of Three Welsh Language POS Taggers0
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts0
Deep One-Class Hate Speech Detection Model0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French0
A Semi-Automated Live Interlingual Communication Workflow Featuring Intralingual Respeaking: Evaluation and Benchmarking0
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction0
MTLens: Machine Translation Output Debugging0
Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems0
NEWTS: A Corpus for News Topic-Focused Summarization0
bsnsing: A decision tree induction method based on recursive optimal boolean rule compositionCode0
AI-enabled Sound Pattern Recognition on Asthma Medication Adherence: Evaluation with the RDA Benchmark SuiteCode0
Benchmarking Unsupervised Anomaly Detection and Localization0
A Framework for Generating Informative Benchmark InstancesCode0
Bias Reduction via Cooperative Bargaining in Synthetic Graph Dataset GenerationCode0
Benchmarking of Deep Learning models on 2D Laminar Flow behind Cylinder0
Large Language Models are Few-Shot Clinical Information Extractors0
Show:102550
← PrevPage 85 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified