Skip to main content

Use Scikit-learn Estimators with Metaflow


I have a scikit-learn workflow that I want to incorporate into a Metaflow flow. How can I include model fitting, prediction, feature transformations, and other capabilities enabled by scikit-learn in flow steps?


Note that this example uses a random forest classifier but the following applies to all scikit-learn models.

To turn this into a Metaflow flow, you first need to decide what your steps are going to be. In this case, there are distinct steps to:

  1. Load data.
  2. Instantiate a model.
  3. Train a model with cross-validation.

1Estimators to Flows

In general, this involves some design choices and we have some rules of thumb here. A benefit of separating flows into Metaflow steps is that you can resume failed computation from any step without having to recompute everything prior to the failed step which makes development much faster.

2Run Flow

This flow shows how to:

  • Import FlowSpec and step.
  • Include step-specific imports within each step.
  • Assign any data structures you wish to pass between steps to self.
  • Train a model and apply cross validation to evaluate it.
from metaflow import FlowSpec, step

class SklearnFlow(FlowSpec):

def start(self):
from sklearn import datasets
self.iris = datasets.load_iris()
self.X = self.iris['data']
self.y = self.iris['target']

def rf_model(self):
from sklearn.ensemble import RandomForestClassifier
self.clf = RandomForestClassifier(

def train(self):
from sklearn.model_selection import cross_val_score
self.scores = cross_val_score(self.clf, self.X,
self.y, cv=5)

def end(self):
print("SklearnFlow is all done.")

if __name__ == "__main__":

The example shows how to use the --with card CLI option to use a Metaflow card which produces HTML visualizations.

python run --with card
[1663366789156643/end/4 (pid 5065)] Task is starting.
[1663366789156643/end/4 (pid 5065)] SklearnFlow is all done.
[1663366789156643/end/4 (pid 5065)] Task finished successfully.

3View Card

Now you can view the card for the train step using this command:

python card view train

Further Reading