XGBoost: A Scalable Tree Boosting System
2016-03-09Code Available4· sign in to hype
Tianqi Chen, Carlos Guestrin
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/dmlc/xgboostOfficialIn papernone★ 28,153
- github.com/Automunge/AutoMungetf★ 164
- github.com/Hem7513/Decision-Trees-and-XGBoost-Algorithm-Documentationpytorch★ 1
- github.com/jlanday/Towards-the-Minimal-Spectrum-of-Excited-Baryonsnone★ 0
- github.com/poyushen/classifactionnone★ 0
- github.com/jiangzhongkai/ifly-algorithm_challengenone★ 0
- github.com/jlanday/Model-Selection-for-Pion-Photoproductionnone★ 0
- github.com/1082-datascience/finalproject-finalproject-1082ds_group3none★ 0
- github.com/pierobeat/Hoax-News-Classificationnone★ 0
- github.com/jlanday/Language-Detectionnone★ 0
Abstract
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 200k Short Texts for Humor Detection | XGBoost | F1-score | 0.81 | — | Unverified |