Reuse Model Object
Question
How can I reuse model code in training and prediction flows?
Solution
A common pattern when using Metaflow is to move complex business logic outside of the flow. This makes the logic callable from multiple flows and more easily tested independent of the flow.
1Make Class Used in Multiple Flows
Imagine you have the following model class:
model.py
class Model():
def init_model(self, model_type = None, params:dict = {}):
return model_type(**params)
def train(self, model, features, labels):
return model.fit(features, labels)
def score(self, model, features, true_labels):
preds = model.predict(features)
return {
"accuracy": sum(true_labels==preds)/len(true_labels)
}
Now you can use multiple inheritance with this object when we instantiate our FlowSpec
class.
2Use Model Class in Training Flow
This flow demonstrates how the Model
class functions can be inherited by the flow. The flow shows how to:
- Instantiate and splits a dataset from scikit-learn.
- Initialize a model using the previously defined class.
- Train the model.
- Score the model on a validation set and prints the result.
train_model_flow.py
from metaflow import step, FlowSpec
from model import Model
class TrainingFlow(FlowSpec, Model):
@step
def start(self):
from sklearn import datasets
from sklearn.model_selection import train_test_split
self.iris = datasets.load_iris()
X, y = self.iris['data'], self.iris['target']
self.labels = self.iris['target_names']
split = train_test_split(X, y, test_size=0.2)
self.X_train, self.X_test = split[0], split[1]
self.y_train, self.y_test = split[2], split[3]
self.next(self.make_model)
@step
def make_model(self):
from sklearn.ensemble import RandomForestClassifier
self.params = {"max_depth": 8}
self.model = self.init_model(
model_type = RandomForestClassifier,
params = self.params
)
self.next(self.train_model)
@step
def train_model(self):
self.model = self.train(self.model, self.X_train, self.y_train)
self.next(self.end)
@step
def end(self):
scores = self.score(self.model, self.X_test, self.y_test)
print('Accuracy: ', scores['accuracy'])
if __name__ == "__main__":
TrainingFlow()
python train_model_flow.py run
3Use Model Class in Scoring Flow
Now you can use multiple inheritance again to instantiate a different flow.
This flow shows how to:
- Create a test dataset to score.
- Instantiate a model using the trained model object from
TrainFlow
. - Use the common
Model
class function to score the model on the test dataset.
scoring_model_flow.py
from metaflow import step, FlowSpec
from model import Model
class ScoringFlow(FlowSpec, Model):
sibling_flow = 'TrainingFlow'
@step
def start(self):
from sklearn import datasets
iris = datasets.load_iris()
self.X, self.y = iris['data'], iris['target']
self.next(self.score_trained_model)
@step
def score_trained_model(self):
from metaflow import Flow
run = Flow(self.sibling_flow).latest_successful_run
self.model = run['end'].task.data.model
self.scores = self.score(self.model, self.X, self.y)
self.next(self.end)
@step
def end(self):
print('Accuracy: ', self.scores['accuracy'])
if __name__ == "__main__":
ScoringFlow()
python scoring_model_flow.py run