Skip to main content

Use XGBoost with Metaflow


How can I build and fit an XGBoost model in a Metaflow flow?


There are two common ways to fit XGBoost models and you can use both with Metaflow. In XGBoost documentation they are referred to as the learning API and the scikit-learn API. This example uses the learning API but you can build flows with either.

1Run Flow

The flow shows how to:

  1. Load training data.
  2. Instantiate the XGBoost model.
  3. Train the model with cross-validation.
from metaflow import FlowSpec, step, Parameter

class XGBFlow(FlowSpec):

def start(self):
from sklearn import datasets
self.iris = datasets.load_iris()
self.X = self.iris['data']
self.y = self.iris['target']

def train_model(self):
import xgboost as xgb
dtrain = xgb.DMatrix(self.X, self.y)
self.results =
params = {'num_class':3,

def end(self):
print("Flow is done.")

if __name__ == "__main__":
python run
     Workflow starting (run-id 1654221281882630):
[1654221281882630/start/1 (pid 71160)] Task is starting.
[1654221281882630/start/1 (pid 71160)] Task finished successfully.
[1654221281882630/train_model/2 (pid 71199)] Task is starting.
[1654221281882630/train_model/2 (pid 71199)] Task finished successfully.
[1654221281882630/end/3 (pid 71262)] Task is starting.
[1654221281882630/end/3 (pid 71262)] Flow is done.
[1654221281882630/end/3 (pid 71262)] Task finished successfully.

2Access Artifacts Outside of Flow

The following can be run in a Python script or notebook to access the contents of the dataframe that was stored as a flow artifact with self.results:

from metaflow import Flow
run = Flow('XGBFlow').latest_run
train-mlogloss-mean train-mlogloss-std test-mlogloss-mean test-mlogloss-std
0 0.741877 0.001425 0.750814 0.002562
1 0.533298 0.003306 0.550585 0.001667
2 0.394987 0.002554 0.421669 0.002304
3 0.300281 0.002392 0.337402 0.003478
4 0.231565 0.001567 0.280347 0.004483

Further Reading