SOTAVerified

Benchmarking

Papers

Showing 32013225 of 5548 papers

TitleStatusHype
Datasets and Benchmarks for Offline Safe Reinforcement LearningCode2
MUBen: Benchmarking the Uncertainty of Molecular Representation ModelsCode0
RRSIS: Referring Remote Sensing Image Segmentation0
A Cloud-based Machine Learning Pipeline for the Efficient Extraction of Insights from Customer Reviews0
detrex: Benchmarking Detection Transformers0
Benchmarking Neural Network Training AlgorithmsCode4
Contribution à l'Optimisation d'un Comportement Collectif pour un Groupe de Robots Autonomes0
Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine PerceptionCode2
NeuroGraph: Benchmarks for Graph Machine Learning in Brain ConnectomicsCode1
Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration0
A Large-Scale Analysis on Self-Supervised Video Representation Learning0
DynamoRep: Trajectory-Based Population Dynamics for Classification of Black-box Optimization ProblemsCode0
FedSecurity: Benchmarking Attacks and Defenses in Federated Learning and Federated LLMsCode0
Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical MLCode1
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language ModelsCode0
FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems0
Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation FrameworkCode0
On the Detectability of ChatGPT Content: Benchmarking, Methodology, and Evaluation through the Lens of Academic WritingCode1
Improved statistical benchmarking of digital pathology models using pairwise frames evaluation0
RD-Suite: A Benchmark for Ranking Distillation0
Knowing-how & Knowing-that: A New Task for Machine Comprehension of User ManualsCode0
Benchmarking Foundation Models with Language-Model-as-an-Examiner0
Self-Adjusting Weighted Expected Improvement for Bayesian OptimizationCode0
ICON^2: Reliably Benchmarking Predictive Inequity in Object Detection0
Benchmarking Robustness of AI-Enabled Multi-sensor Fusion Systems: Challenges and Opportunities0
Show:102550
← PrevPage 129 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified