SOTAVerified

Benchmarking

Papers

Showing 38263850 of 5548 papers

TitleStatusHype
EmProx: Neural Network Performance Estimation For Neural Architecture SearchCode0
BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents0
Data-Driven Denoising of Stationary Accelerometer SignalsCode1
CodeS: Towards Code Model Generalization Under Distribution ShiftCode0
SAIBench: Benchmarking AI for Science0
Challenges and Opportunities in Offline Reinforcement Learning from Visual ObservationsCode2
SwinCheX: Multi-label classification on chest X-ray images with transformersCode1
Functional Code Building Genetic Programming0
Do We Need Another Explainable AI Method? Toward Unifying Post-hoc XAI Evaluation Methods into an Interactive and Multi-dimensional BenchmarkCode1
Benchmarking Bayesian neural networks and evaluation metrics for regression tasks0
FedHPO-B: A Benchmark Suite for Federated Hyperparameter Optimization0
Scaling laws in global corporations as a benchmarking approach to assess environmental performance0
Revisiting Realistic Test-Time Training: Sequential Inference and Adaptation by Anchored ClusteringCode1
MorisienMT: A Dataset for Mauritian Creole Machine Translation0
Which models are innately best at uncertainty estimation?0
Revisiting the "Video" in Video-Language UnderstandingCode1
Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning RatesCode0
Evaluation of Three Welsh Language POS Taggers0
Deep One-Class Hate Speech Detection Model0
Introducing RezoJDM16k: a French KnowledgeGraph DataSet for Link Prediction0
Benchmarking Language Models for Cyberbullying Identification and Classification from Social-media Texts0
Low-resource Neural Machine Translation: Benchmarking State-of-the-art Transformer for Wolof<->French0
A Japanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog DomainCode1
Jojajovai: A Parallel Guarani-Spanish Corpus for MT BenchmarkingCode1
MTLens: Machine Translation Output Debugging0
Show:102550
← PrevPage 154 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified