SOTAVerified

Benchmarking

Papers

Showing 38013850 of 5548 papers

TitleStatusHype
VRKitchen2.0-IndoorKit: A Tutorial for Augmented Indoor Scene Building in OmniverseCode0
The ArtBench Dataset: Benchmarking Generative Models with ArtworksCode2
DaisyRec 2.0: Benchmarking Recommendation for Rigorous EvaluationCode2
GEMv2: Multilingual NLG Benchmarking in a Single Line of CodeCode1
OpenXAI: Towards a Transparent Evaluation of Model ExplanationsCode1
Beyond Uniform Lipschitz Condition in Differentially Private Optimization0
BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed GraphsCode0
Benchmarking Constraint Inference in Inverse Reinforcement LearningCode1
ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasetsCode0
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text InputsCode1
Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking0
NAS-Bench-Graph: Benchmarking Graph Neural Architecture SearchCode1
Motley: Benchmarking Heterogeneity and Personalization in Federated LearningCode0
SMPL: Simulated Industrial Manufacturing and Process Control Learning EnvironmentsCode1
Colonoscopy 3D Video Dataset with Paired Depth from 2D-3D Registration0
Long Range Graph BenchmarkCode1
SATBench: Benchmarking the speed-accuracy tradeoff in object recognition by humans and dynamic neural networksCode0
Pythae: Unifying Generative Autoencoders in Python -- A Benchmarking Use Case0
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability0
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models0
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation LearningCode0
Taxonomy of Benchmarks in Graph Representation LearningCode1
RecBole 2.0: Towards a More Up-to-Date Recommendation LibraryCode4
ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation datasetCode1
Evaluating histopathology transfer learning with ChampKitCode1
EmProx: Neural Network Performance Estimation For Neural Architecture SearchCode0
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents0
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
CodeS: Towards Code Model Generalization Under Distribution ShiftCode0
SAIBench: Benchmarking AI for Science0
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
SwinCheX: Multi-label classification on chest X-ray images with transformersCode1
Functional Code Building Genetic Programming0
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional BenchmarkCode1
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance0
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored ClusteringCode1
MorisienMT: A Dataset for Mauritian Creole Machine Translation0
Which models are innately best at uncertainty estimation?0
Revisiting the "Video" in Video-Language UnderstandingCode1
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning RatesCode0
Evaluation of Three Welsh Language POS Taggers0
Deep One-Class Hate Speech Detection Model0
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction0
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French0
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT BenchmarkingCode1
MTLens: Machine Translation Output Debugging0
Show:102550
← PrevPage 77 of 111Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified