ML Lifecycle management: MLflow integration in the Platform
Introduction to MLflow
MLflow is an open-source platform to manage the Machine Learning development life cycle from start to finish, and for this it includes four main functionalities/modules:
Experiment tracking to record and compare parameters and results (MLflow Tracking).
Packaging of ML code in a reusable and reproducible way to share it with other data scientists and deploy it in production (MLflow Projects).
Management and deployment of models from a wide variety of ML libraries to a variety of platforms for their servicing (MLflow Models).
Provide a central model repository to collaboratively manage the entire lifecycle of an MLflow model, including model versioning, stage transitions, and annotations (MLflow Model Registry).
Platform Integration
The user with Analytics role or Administrator role will have a new menu option in ANALYTICS TOOLS:
From this menu option, she can access the Experiments created as well as the Models:
Use from the command console
It is possible to use this new tool from any client terminal that has access to the platform. For Python, it will be necessary to install two libraries:
MLFlow → pip install mlflow
Plugin mlflow onesaitplatform (using the file upload tool to work with ML project files) → pip install mlflow-onesaitplatform-plugin
Next, we will have to set the URL of the tracking server of the MLflow of onesaitplatform. We have two ways to do it.
Through an environment variable called MLFLOW_TRACKING_URI, which we will set to the environment/controlpanel/mlflow
Through Python code using the method:
Once we have our url, we can start working with different experiments that we will see in our interface on onesaitplatform. For example, a typical project would be the following:
Where we mainly have the train.py code where the algorithm will be trained and different parameters will be logged in the platform model manager:
We will also have the conda.yaml where we will have the different dependencies of it.
Besides the project definition file itself MLProject.
This will not be necessary to log parameters, but having this structure will allow us to use the MLflow commands to launch it directly and use our project as a single piece.
Use from the platform’s notebooks
To use the notebook, it will not be necessary to set the tracking server variable or install the different libraries, since everything will be loaded by default.
With this, simply by importing MLflow with the Python code, we can start working. We can also work in a shared mode so that we develop our code locally and, through git, we download it to the notebook and launch it with all the calculation capacity of the server.
Next steps
Complete integration with Platform security: each user will see their experiments and will be able to share them with other users.
Serving of Models as Platform APIs.
Deploying of models as containers.