SOTAVerified

Benchmarking

Papers

Showing 42014225 of 5548 papers

TitleStatusHype
Dyna-bAbI: unlocking bAbI’s potential with dynamic synthetic benchmarking0
HATE-ITA: New Baselines for Hate Speech Detection in ItalianCode0
Benchmarking Intersectional Biases in NLPCode0
SentSpace: Large-Scale Benchmarking and Evaluation of Text using Cognitively Motivated Lexical, Syntactic, and Semantic Features0
Local manifold learning and its link to domain-based physics knowledgeCode0
Analyzing the behaviour of D'WAVE quantum annealer: fine-tuning parameterization and tests with restrictive Hamiltonian formulations0
Benchmarking Language-agnostic Intent Classification for Virtual Assistant PlatformsCode0
Beyond Emotion: A Multi-Modal Dataset for Human Desire Understanding0
Computer-aided diagnosis and prediction in brain disorders0
An extensible Benchmarking Graph-Mesh dataset for studying Steady-State Incompressible Navier-Stokes EquationsCode0
Toward an ImageNet Library of Functions for Global Optimization Benchmarking0
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in OmniverseCode0
Beyond Uniform Lipschitz Condition in Differentially Private Optimization0
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed GraphsCode0
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasetsCode0
Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking0
Motley: Benchmarking Heterogeneity and Personalization in Federated LearningCode0
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration0
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation LearningCode0
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case0
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networksCode0
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability0
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models0
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents0
EmProx: Neural Network Performance Estimation For Neural Architecture SearchCode0
Show:102550
← PrevPage 169 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified