ML Lifecycle Management: MLFlow Platform Integration

MLFlow Introduction

MLflow is an open-source platform to manage the Machine Learning development lifecycle from start to finish, and for this purpose it includes 4 main functionalities/modules:

  • Tracking of experiments to record and compare parameters and results (MLflow Tracking).

  • Packaging the ML code in a reusable and reproducible form for sharing with other data scientists and deploying it into production (MLflow Projects).

  • Management and deployment of models from a wide variety of ML libraries to a variety of platforms for servicing (MLflow Models)

  • Provide a central model repository to collaboratively manage the entire lifecycle of an MLflow model, including model versioning, stage transitions, and annotations (MLflow Model Registry).

Platform Integration

  • The user with Analytics role or Administrator role will have a new menu option in ANALYTICS TOOLS:

  • From this menu item you can access the Experiments created as well as the Models.

Use from the command console

It is possible to use this new tool from any client terminal that has access to the platform. For Python it will be necessary to install two libraries:

  • MLFlow → pip install mlflow

  • mlflow onesaitplatform plugin (use of the file upload tool to work with ML project files) → pip install mlflow-onesaitplatform-plugin

Next, we will have to set the URL of the onesaitplatform MLFlow tracking server. We have two ways to do this.

  • By means of an environment variable called MLFLOW_TRACKING_URI, which we will set in the environment/controlpanel/mlflow

  • Through python code using the method

Once we have our url we can start working with different experiments that we will see in our onesaitplatform interface. For example a typical project would be the following:

Where we have mainly the code of train.py where the algorithm will be trained and different parameters will be logged in the platform model manager.

We will also have the conda.yaml where we will have the different dependencies of the same one.

Apart from the MLProject definition file itself.

This will not be necessary to get parameters but having this structure will allow us to use MLFflow commands to launch it directly and use our project as a single piece.

Use from platform notebooks

For notebook use it will not be necessary to set the tracking server variable or install the different libraries since everything will be loaded by default.

With this, simply by importing mlflow with the python code we can start working. We can also work in a shared mode so that we develop our code locally and through git we download it to the notebook and launch it with all the computing power of the server.

Next steps

  • Full integration with Platform security: each user will see his experiments and will be able to share them with other users.

  • Serving Models as Platform APIS

  • Deployment of models as containers