I have a scikit-learn workflow that I want to incorporate into a Metaflow flow. How can I include model fitting, prediction, feature transformations, and other capabilities enabled by scikit-learn in flow steps?
Note that this example uses a random forest classifier but the following applies to all scikit-learn models.
To turn this into a Metaflow flow, you first need to decide what your steps are going to be. In this case, there are distinct steps to:
- Load data.
- Instantiate a model.
- Train a model with cross-validation.
1Estimators to Flows
In general, this involves some design choices and we have some rules of thumb here. A benefit of separating flows into Metaflow steps is that you can resume failed computation from any step without having to recompute everything prior to the failed step which makes development much faster.
This flow shows how to:
- Include step-specific imports within each step.
- Assign any data structures you wish to pass between steps to self.
- Train a model and apply cross validation to evaluate it.
from metaflow import FlowSpec, step
from sklearn import datasets
self.iris = datasets.load_iris()
self.X = self.iris['data']
self.y = self.iris['target']
from sklearn.ensemble import RandomForestClassifier
self.clf = RandomForestClassifier(
from sklearn.model_selection import cross_val_score
self.scores = cross_val_score(self.clf, self.X,
print("SklearnFlow is all done.")
if __name__ == "__main__":
The example shows how to use the
--with card CLI option to use a Metaflow
card which produces HTML visualizations.
python fit_sklearn_estimator.py run --with card
[1654221287111677/end/4 (pid 71455)] Task is starting.
[1654221287111677/end/4 (pid 71455)] SklearnFlow is all done.
[1654221287111677/end/4 (pid 71455)] Task finished successfully.
Now you can view the card for the
train step using this command:
python fit_sklearn_estimator.py card view train