Skip to main content

Use Optuna with Metaflow

Question

I have an Optuna process for hyperparameter tuning and want to structure it in a Metaflow flow.

Solution

There are several ways to leverage Optuna's optimization features with Metaflow. When designing a Metaflow flow to structure, execute, and store results of an Optuna study it is important to understand the characteristics of the objective function you are optimizing. For example, depending on how long it takes to evaluate the objective function you may wish to execute the flow on a single process, with multiple processes, or even across multiple nodes.

1Run Flow

This flow shows how you can run an optimization loop with 10 evaluations of the objective function in a single process.

Resources to help extend the flow for multi-process and multi-node implementations are linked in the further reading section below.

optuna_flow.py
from metaflow import FlowSpec, step
from metaflow.cards import Image

def objective(trial):
from sklearn.datasets import load_iris
from sklearn.tree import ExtraTreeClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
data = load_iris()
X, y = data['data'], data['target']
max_depth = trial.suggest_int('max_depth', 2, 16)
criterion = trial.suggest_categorical(
'criterion',
["gini", "entropy"]
)
model = ExtraTreeClassifier(max_depth=max_depth,
criterion=criterion)
return np.mean(cross_val_score(model, X, y, cv=5))

class OptunaFlow(FlowSpec):

@step
def start(self):
self.next(self.optimization_loop)

@step
def optimization_loop(self):
import optuna
self.study = optuna.create_study()
self.study.optimize(objective, n_trials=10)
self.next(self.end)

@step
def end(self):
self.results = self.study.trials_dataframe()

if __name__ == "__main__":
OptunaFlow()
python optuna_flow.py run
    ...
[1654221285645277/end/3 (pid 71342)] Task is starting.
[1654221285645277/end/3 (pid 71342)] Task finished successfully.
...

2Access Artifacts Outside of Flow

The following can be run in a Python script or notebook to access the contents of the DataFrame that was stored as a flow artifact with self.results.

from metaflow import Flow 
run = Flow('OptunaFlow').latest_run
run.data.results.head()
number value datetime_start datetime_complete duration params_criterion params_max_depth state
0 0 0.926667 2022-06-02 20:54:48.817826 2022-06-02 20:54:48.861575 0 days 00:00:00.043749 gini 6 COMPLETE
1 1 0.733333 2022-06-02 20:54:48.861829 2022-06-02 20:54:48.864031 0 days 00:00:00.002202 entropy 2 COMPLETE
2 2 0.913333 2022-06-02 20:54:48.864197 2022-06-02 20:54:48.866374 0 days 00:00:00.002177 gini 10 COMPLETE
3 3 0.946667 2022-06-02 20:54:48.866538 2022-06-02 20:54:48.868643 0 days 00:00:00.002105 entropy 9 COMPLETE
4 4 0.960000 2022-06-02 20:54:48.868813 2022-06-02 20:54:48.870919 0 days 00:00:00.002106 gini 15 COMPLETE

Further Reading