Skip to main content

Track Artifacts with CometML

Question

How can I track artifacts of my flows with Comet ML?

Solution

You can track flow artifacts using any Comet ML calls you already use because you can use any Python code in Metaflow steps. In addition, the Comet ML team developed an integration with Metaflow to make tracking artifacts produced in flow runs even more convenient.

The remainder of this page will walk through the following topics:

  • What is Comet ML?
  • How to write a flow using the Comet integration?
  • How to run the flow that tracks experiments with Comet?

1What is Comet ML?

Comet ML is a platform to track, compare, explain, and optimize. There is a comet_ml Python library that allows you to read and write data about the configuration and results of your data science experiments. After you sign up, you can use your Comet API key to create an Experiment in Python code or with their APIs.

import comet_ml
import os

experiment = comet_ml.Experiment(
# read env var set like `export COMET_API_KEY=<>`
api_key=os.getenv('COMET_API_KEY'),
# read env var set like `export COMET_PROJECT_NAME=<>`
project_name=os.getenv('COMET_PROJECT_NAME')
)
    COMET WARNING: As you are running in a Jupyter environment, you will need to call `experiment.end()` when finished to ensure all metrics and code are logged before exiting.
COMET INFO: Experiment is live on comet.com https://www.comet.com/eddie-outerbounds/comet-integration/107ad5abd1614ce3aabedefc49859d1c

Experiments are the core data structure Comet helps you organize information with. You can read more about Experiments here.

The rest of this page shows how to use Comet's Metaflow integration to automate the creation and reporting of data to Comet Experiment objects.

2Write a Flow using the Comet Integration

The script shows how to:

  • Login to Comet before running the script.
    • The init() call in the main section of this script establishes a connection to Comet. This will try to read the value in the COMET_API_KEY environment variable if you have it set. You can read more about configuring Comet in a Python environment here.
  • Create a set of Comet Experiment objects to track both the individual tasks and the state of the flow as a whole.
  • Log parameters and metrics with Comet from the flow runtime.
    • Observe the train_model step and notice that self.comet_experiment is accessible automatically because of the @comet_flow decorator.
track_with_comet_integration.py
from comet_ml import init
from comet_ml.integration.metaflow import comet_flow
from metaflow import FlowSpec, JSONType, Parameter, card, step

@comet_flow(project_name="comet-metaflow")
class CometFlow(FlowSpec):

@step
def start(self):
import plotly.express as px
from sklearn.model_selection import train_test_split
self.input_df = px.data.tips()
self.X = self.input_df.total_bill.values[:, None]
self.X_train, self.X_test, \
self.Y_train, self.Y_test = train_test_split(
self.X, self.input_df.tip, random_state=42
)
self.next(self.train_model)

@step
def train_model(self):
import numpy as np
from sklearn import linear_model
from comet_ml import API
self.model = linear_model.LinearRegression()
self.model.fit(self.X_train, self.Y_train)
self.score = self.model.score(self.X_test, self.Y_test)
self.comet_experiment.log_parameter("model", self.model)
self.comet_experiment.log_metric("score", self.score)
self.next(self.end)

@step
def end(self):
pass

if __name__ == "__main__":
init()
CometFlow()

3Run the Flow

Now that you have configured Comet to track Experiments for this flow, you can run it from the command line in the normal Metaflow way.

python track_with_comet_integration.py run
     Workflow starting (run-id 1665870683555031):
[1665870683555031/start/1 (pid 28379)] Task is starting.
[1665870683555031/start/1 (pid 28379)]
[1665870683555031/start/1 (pid 28379)]
[1665870683555031/start/1 (pid 28379)] Task finished successfully.
[1665870683555031/train_model/2 (pid 28385)] Task is starting.
[1665870683555031/train_model/2 (pid 28385)]
[1665870683555031/train_model/2 (pid 28385)] Task finished successfully.
[1665870683555031/end/3 (pid 28391)] Task is starting.
[1665870683555031/end/3 (pid 28391)]
[1665870683555031/end/3 (pid 28391)]
[1665870683555031/end/3 (pid 28391)] Task finished successfully.
Done!

Further Reading