XGBoost: A Scalable Tree Boosting System

2016-03-09Code Available4· sign in to hype

Tianqi Chen, Carlos Guestrin

Code Available — Be the first to reproduce this paper.

Code

github.com/dmlc/xgboost
OfficialIn papernone★ 28,153
github.com/Hem7513/Decision-Trees-and-XGBoost-Algorithm-Documentation
pytorch★ 1
github.com/kwantommy/breast-cancer-diagnosis
none★ 0
github.com/jlanday/Towards-the-Minimal-Spectrum-of-Excited-Baryons
none★ 0
github.com/poyushen/classifaction
none★ 0
github.com/jiangzhongkai/ifly-algorithm_challenge
none★ 0
github.com/jlanday/Model-Selection-for-Pion-Photoproduction
none★ 0
github.com/1082-datascience/finalproject-finalproject-1082ds_group3
none★ 0
github.com/pierobeat/Hoax-News-Classification
none★ 0
github.com/jlanday/Language-Detection
none★ 0

Abstract

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

Tasks

BIG-bench Machine Learning Clustering Data Compression Dimensionality Reduction General Classification Humor Detection Video Super-Resolution

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
200k Short Texts for Humor Detection	XGBoost	F1-score	0.81	—	Unverified

XGBoost: A Scalable Tree Boosting System

Code

Abstract

Tasks

Benchmark Results

Reproductions