Skip to main content

Computer Vision with Metaflow: Beginner Tutorial

In this tutorial, you will build a set of workflows to train and evaluate a machine learning model that performs image classification. You will use Keras and Metaflow to write computer vision code you can use as a foundation for real-world data science projects.

Computer Vision with Metaflow - Beginner

Try it out directly in your browser

Open in Sandbox
from metaflow import FlowSpec, step, Flow, current, card
from metaflow.cards import Image, Table
from tensorflow import keras
from models import ModelOperations

class TuningFlow(FlowSpec, ModelOperations):

best_model_location = ("best_tuned_model")
num_pixels = 28 * 28
kernel_initializer = 'normal'
optimizer = 'adam'
loss = 'categorical_crossentropy'
metrics = [
'accuracy',
'precision at recall'
]
input_shape = (28, 28, 1)
kernel_size = (3, 3)
pool_size = (2, 2)
p_dropout = 0.5
epochs = 5
batch_size = 64
verbose = 2

@step
def start(self):
import numpy as np
self.num_classes = 10
((x_train, y_train),
(x_test, y_test)) = keras.datasets.mnist.load_data()
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
self.x_train = np.expand_dims(x_train, -1)
self.x_test = np.expand_dims(x_test, -1)
self.y_train = keras.utils.to_categorical(
y_train, self.num_classes)
self.y_test = keras.utils.to_categorical(
y_test, self.num_classes)
self.param_config = [
{"hidden_conv_layer_sizes": [16, 32]},
{"hidden_conv_layer_sizes": [16, 64]},
{"hidden_conv_layer_sizes": [32, 64]},
{"hidden_conv_layer_sizes": [32, 128]},
{"hidden_conv_layer_sizes": [64, 128]}
]
self.next(self.train, foreach='param_config')

@step
def train(self):
from neural_net_utils import plot_learning_curves
self.model = self.make_cnn(
self.input['hidden_conv_layer_sizes'])
self.history, self.scores = self.fit_and_score(
self.x_train, self.x_test)
self._name = 'CNN'
self.plots = [
Image.from_matplotlib(p) for p in
plot_learning_curves(
self.history,
'Hidden Layers - ' + ', '.join([
str(i) for i in
self.input['hidden_conv_layer_sizes']
])
)
]
self.next(self.gather_scores)

@card
@step
def gather_scores(self, models):
import pandas as pd
self.max_class = models[0].y_train
results = {
'hidden conv layer sizes': [],
'model': [],
'test loss': [],
**{metric: [] for metric in self.metrics}
}
max_seen_acc = 0
rows = []
for model in models:
results['model'].append(model._name)
results['test loss'].append(model.scores[0])
for i, metric in enumerate(self.metrics):
results[metric].append(model.scores[i+1])
results['hidden conv layer sizes'].append(
','.join([
str(i) for i in model.input[
'hidden_conv_layer_sizes'
]
])
)
# A simple rule for determining the best model.
# In production flows you need to think carefully
# about how this kind of rule maps to your objectives.
if model.scores[1] > max_seen_acc:
self.best_model = model.model
max_seen_acc = model.scores[1]
rows.append(model.plots)

current.card.append(Table(rows))
self.results = pd.DataFrame(results)
self.next(self.end)

@step
def end(self):
self.best_model.save(self.best_model_location)

if __name__ == '__main__':
TuningFlow()

Prerequisites

We assume that you have taken the introductory tutorials or know the basics of Metaflow.

Tutorial Structure

The content includes the following:

Each episode contains either a Metaflow script to run and/or a Jupyter notebook. You do not need access to cloud computing or a Metaflow deployment to complete the episodes. The estimated time to complete all episodes is 1-2 hours.

Why Metaflow?

The main benefit of using a data science workflow solution like Metaflow when prototyping is that your code will be built on a strong foundation for deploying to a production environment. Metaflow is most useful when projects have scaling requirements, are mission-critical, and/or have many interacting parts. You can read more at these links:

After completing the lessons, you will be able to transfer insights and code from the tutorial to your real-world data science projects. It is important to be mindful that this is a beginner tutorial so it will not reflect many important challenges to consider in production ML environments. For example, in production, you may consider using Metaflow features such as the @conda decorator for dependency management, @batch or @kubernetes for remote execution, and @schedule to automatically trigger jobs.