SOTAVerified

Benchmarking

Papers

Showing 14761500 of 5548 papers

TitleStatusHype
Fast hyperboloid decision tree algorithmsCode1
BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose EstimationCode1
Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language ModelsCode1
BiBench: Benchmarking and Analyzing Network BinarizationCode1
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language ModelsCode1
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App ScreenshotsCode1
ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw TheoryCode1
Benchmarking Graph Neural Networks on Dynamic Link PredictionCode1
Benchmarking Graph Neural Networks for FMRI analysisCode1
Beyond neural scaling laws: beating power law scaling via data pruningCode1
Beyond Normal: On the Evaluation of Mutual Information EstimatorsCode1
Formalizing Multimedia Recommendation through Multimodal Deep LearningCode1
LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking SuiteCode1
Large Language Models for Multi-Robot Systems: A SurveyCode1
LEAF: A Benchmark for Federated SettingsCode1
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language ModelsCode1
Machine Learning for the Digital Typhoon Dataset: Extensions to Multiple Basins and New Developments in Representations and TasksCode1
MIRFLEX: Music Information Retrieval Feature Library for ExtractionCode1
FELM: Benchmarking Factuality Evaluation of Large Language ModelsCode1
Smiles2Dock: an open large-scale multi-task dataset for ML-based molecular dockingCode1
FinanceReasoning: Benchmarking Financial Numerical Reasoning More Credible, Comprehensive and ChallengingCode1
FiFAR: A Fraud Detection Dataset for Learning to DeferCode1
ACCESS DENIED INC: The First Benchmark Environment for Sensitivity AwarenessCode0
Conformal Prediction: A Theoretical Note and Benchmarking Transductive Node Classification in GraphsCode0
Knowledge-Driven Slot Constraints for Goal-Oriented Dialogue SystemsCode0
Show:102550
← PrevPage 60 of 222Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4 TurboACC0.56Unverified