Training and deployment of models with BaseModelService

Available in Python client version 1.4

1 What is BaseModelService?
2 What is needed in OSP?
3 Creation of a BaseModelService child class: SentimentAnalysisModelService
- 3.1 Overwrite the init method
- 3.2 Override the load_model method
- 3.3 Overwrite the train method
- 3.4 Overwrite predict method
4 Creating an object, training and predicting
5 Activating the preferred model using Query Tool

This guide explains, step by step, how to train and run a sentiment analysis model on a Onesait Platform (hereafter OSP) instance using BaseModelService. The guide assumes that the OSP instance deployed at https://lab.onesaitplatform.com will be used. You will need to register following the instructions in the sign-up link if you have not done it before, then log in to the application.

The sentiment analysis model will be trained from this small dataset. The dataset is a small selection of the following dataset. It will be a binary classifier that will assign a value of 1 to texts with positive sentiment and 0 to texts with negative sentiment.

What is BaseModelService?

BaseModelService is a Python class that is distributed as part of the Python client for OSP. The code for this client is maintained by the OSP community on Github, and it can be installed via pip:

pip install onesaitplatform-client-services

BaseModelService allows:

Training models.
Retraining models to generate new versions of these models.
Deploying the trained models.
Making inference with the deployed models.

All this is done by taking advantage of the tools that OSP provides for:

The management of training datasets.
The storage of the trained models.
The control of their different versions.
The deployment via microservices.

BaseModelService abstracts the model developers from the management of this whole functionality, allowing them to make use of it in a simple way. As its name suggests, BaseModelService is a parent class from which the model developer will create a child class that inherits from it.

The child class will contain the specific code to train a particular model, save it to a local path, subsequently load the saved version also from a local path and use it in inference. The model developers can use any type of Python library (scikit-learn, Tensorflow, PyTorch, etc.). They may also use the model saving and loading mechanisms of their choice.

The rest of the OSP interaction tasks have already been defined in the BaseModelService parent class: downloading the dataset from a file in the File Repository, or from an ontology; saving the trained models in the File Repository, downloading these models from the File Repository, checking the different versions of the same model and selecting the preferred version.

What is needed in OSP?

OSP provides support for managing and storing datasets and models. To do so, the following must be configured:

A dataset in the File Repository. This will be the dataset that will be used to train the model.
Alternatively, an ontology in which the dataset in question is stored.
An ontology in which the different versions of the model are registered.
A Digital Client to which the previous ontologies are associated and that allows access to them.

Uploading a dataset to the File Repository

Firstly, the training dataset of the sentiment analysis model will be uploaded to the File Repository of your OSP instance. The dataset in question can be downloaded here. Assuming that you have the file downloaded to a local address, you can use the following code to upload it to the File Repository:

import json
from onesaitplatform.files import FileManager

HOST = "www.lab.onesaitplatform.com"
USER_TOKEN = "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJwcmluY2lwYWwiOiJianJhbWlyZXoiLCJjbGllbnRJZCI6Im9uZXNhaXRwbGF0Zm9ybSIsInVzZXJfbmFtZSI6ImJqcmFtaXJleiIsInNjb3BlIjpbIm9wZW5pZCJdLCJuYW1lIjoiYmpyYW1pcmV6IiwiZXhwIjoxNjE3ODI2NjkzLCJncmFudFR5cGUiOiJwYXNzd29yZCIsInBhcmFtZXRlcnMiOnsidmVydGljYWwiOm51bGwsImdyYW50X3R5cGUiOiJwYXNzd29yZCIsInVzZXJuYW1lIjoiYmpyYW1pcmV6In0sImF1dGhvcml0aWVzIjpbIlJPTEVfREFUQVNDSUVOVElTVCJdLCJqdGkiOiJmNGM2NDUzZC0xYTEyLTRkMGUtYTVlNy05ZmNlMDY4OTY1NDYiLCJjbGllbnRfaWQiOiJvbmVzYWl0cGxhdGZvcm0ifQ.Nz5cDvMjh361z4r6MMD2jUOpYSmUKVLkMThHDK0sg6o"
file_manager = FileManager(host=HOST, user_token=USER_TOKEN)
file_manager.protocol = "https"
uploaded, info = file_manager.upload_file("sentiment_analysis_dataset.csv", "./sentiment_analysis_dataset.csv")
print("Uploaded file: {}".format(uploaded))
print("Information: {}".format(info))

Note that the Python client is used to upload to the OSP. You can check the corresponding documentation here. For authentication, you must use the user token provided in the OSP control panel. It can be found by clicking on “APIs”, in the upper right corner of the control panel. This is a temporary token that will have a limited duration.

The identifier that the File Repository has assigned to the uploaded file will be painted in the response of the above-used code. You will have to provide this identifier later to launch the training of the model from this dataset.

Creating an ontology for model version registration

Next, let’s create an ontology called SentimentAnalysisModels, which is necessary to keep track of the model versions that are being trained. To create the ontology, you must use the menu on the left margin of the control panel: Development > My Ontologies.

In the upper right margin of the resulting screen, click on CREATE.

And in the next screen, click on Creation Step by Step.

Once you do that, the form for the ontology creation opens. Fill in the mandatory data (NAME, DESCRIPTION and META-INFORMATION). As mentioned above, NAME will be SentimentAnalysisModels:

In the lower section of the same screen, select the GENERAL ontology template:

And within GENERAL, click on the last option, EMPTY BASE:

This will enable, in the same screen, the following:

Click on UPDATE SCHEMA. This will open an editor like the following one:

Change to Tree to Text mode, then change the contents to the following schema:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "title": "SentimentAnalysisModels",
    "type": "object",
    "required": [
        "SentimentAnalysisModels"
    ],
    "properties": {
        "SentimentAnalysisModels": {
            "type": "string",
            "$ref": "#/datos"
        }
    },
    "datos": {
        "description": "Info SentimentAnalysisModels",
        "type": "object",
        "required": [
            "name",
            "description",
            "asset",
            "version",
            "metrics",
            "hyperparameters",
            "model_path",
            "date",
            "active"
        ],
        "properties": {
            "name": {
                "type": "string"
            },
            "description": {
                "type": "string"
            },
            "asset": {
                "type": "string"
            },
            "version": {
                "type": "string"
            },
            "metrics": {
                "type": "array",
                "items": {
                    "type": "object",
                    "required": [
                        "name",
                        "value"
                    ],
                    "properties": {
                        "name": {
                            "type": "string"
                        },
                        "value": {
                            "type": "string"
                        },
                        "dtype": {
                            "type": "string"
                        }
                    },
                    "additionalProperties": false
                },
                "minItems": 0
            },
            "hyperparameters": {
                "type": "array",
                "items": {
                    "type": "object",
                    "required": [
                        "name",
                        "value"
                    ],
                    "properties": {
                        "name": {
                            "type": "string"
                        },
                        "value": {
                            "type": "string"
                        },
                        "dtype": {
                            "type": "string"
                        }
                    },
                    "additionalProperties": false
                },
                "minItems": 0
            },
            "model_path": {
                "type": "string"
            },
            "date": {
                "type": "string",
                "format": "date-time"
            },
            "dataset_path": {
                "type": "string"
            },
            "active": {
                "type": "boolean"
            },
            "ontology_dataset": {
                "type": "string"
            }
        }
    },
    "description": "Definition of trained models",
    "additionalProperties": true
}

Once the change has been made, in the lower right margin of the screen, create the ontology by clicking on NEW:

With this, you have created the ontology to register the different versions of the sentiment analysis model that are created. The ontology fields, as shown above, are as follows:

name: model name,
description: model description.
asset: name of the asset in which the model is framed.
version: model version.
metrics: list of evaluation criteria of the model. Each criterion in the list consists of a field name, name of the criterion; a field value, value for that criterion; and dtype, data type of the value.
hyperparameters: list of hyperparameters with which the model has been trained. Each hyperparameter in the list consists of a name field, name of the hyperparameter; a value field, value of the hyperparameter; and dtype, data type of the value.
model_path: identifier of the file corresponding to the model stored in the File Repository.
date: date and time at which the model is created.
dataset_path: identifier of the file corresponding to the training dataset used for training in the File Repository.
active: Boolean denoting whether a version of the model is active. Usually, only one version of the model will be active, and it will be the one loaded as the serviced model.
ontology_dataset: name of the ontology in which the dataset is stored.

Creation of a Digital Client

Once you have a SentimentAnalysisModels ontology, you have to create a Digital Client with which BaseModelService can access the ontology in question. To do this, click on Clients & Digital Twins > My Digital Clients in the menu on the left margin of the control panel:

Once again, click on CREATE in the resulting screen:

This takes you to the Digital Client creation form. You must fill in the IDENTIFICATION field with the name SentimentAnalysisDigitalClient and add a description under DESCRIPTION. Under ONTOLOGIES, select the previously-created ontology (SentimentAnalysisModels) and give it ACCESS LEVEL ALL. Click on ADD ONTOLOGY:

Bear in mind that the token that appears under this form will be necessary from now on to access the ontology through this Digital Client. For the creation of the Digital Client to be effective, click on the NEW button on the lower right margin of this screen:

This has created the Digital Client SentimentAnalysisDigitalClient, which will be used to access the model's version registration ontology.

Creation of a BaseModelService child class: SentimentAnalysisModelService

To create an object that manages the training, saving, loading, deployment and usage of a particular model, you must create a Python class that inherits from BaseModelService. In this tutorial, you will create a class that manages sentiment analysis models. It will be thus called SentimentAnalysisModelService:

from onesaitplatform.model import BaseModelService

class SentimentAnalysisModelService(BaseModelService):
    """Service for models of Sentiment Analysis"""

    def __init__(self, **kargs):
        """
        YOUR CODE HERE
        """
        super().__init__(**kargs)
        
    def load_model(self, model_path=None, hyperparameters=None):
        """Loads a previously trained model and save it a one or more object attributes"""

        """
        YOUR CODE HERE
        """

    def train(self, dataset_path=None, hyperparameters=None, model_path=None):
        """
        Trains a model given a dataset and saves it in a local path.
        Returns a dictionary with the obtained metrics
        """

        """
        YOUR CODE HERE
        """

        return metrics

    def predict(self, inputs=None):
        """Predicts given a model and an array of inputs"""

        """
        YOUR CODE HERE
        """

        return results

As you can see, the child class has to override the BaseModelService’s init, load_model, train and predict methods.

Place and execute the code in a Notebook in your OSP instance at https://lab.onesaitplatform.com. To do this, click on MY ANALYTICS TOOLS > MY NOTEBOOKS in the form on the left margin of the control panel.

This opens a screen. In its upper right corner, click on NEW NOTEBOOK:

This opens a form in which you can name the Notebook:

Once you click OK, an Apache Zeppelin notebook opens. In it, you are going to write the code that follows.

Specifically, you will create a SentimentAnalysisModelService class to manage sentiment analysis models on Spanish data. It will be a binary text classifier: the output will be 0 for texts with negative sentiment, 1 for texts with positive sentiment. Tensorflow 2.x will be used for this purpose. A perceptron will be built whose input will be a bag of words with tf-idf. These will be toy models. We don't intend to elaborate good models here, only to show you how to develop them in a simple way. The model will be saved using h5 and pickle: a file with the weights resulting from the training and another file for the tokenizer that is trained to do the text preprocessing.

Then, import the libraries and classes needed to build the model as described above. Along with them, import the BaseModelService class:

import numpy as np
import tensorflow.keras
from tensorflow.keras.preprocessing import text
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from onesaitplatform.model import BaseModelService

Overwrite the init method

In the init method, you will initialize the attributes that will later be used in other methods to reference the model. Specifically, for the sentiment model to be developed in this tutorial, two attributes will be enabled: one to store the model itself (model), the neural network that will return 1 for texts with positive sentiment and 0 for texts with negative sentiment; and another for the text preprocessor (preprocessor):

class SentimentAnalysisModelService(BaseModelService):
    """Service for models of Sentiment Analysis"""

    def __init__(self, **kargs):
        self.model = None
        self.preprocessor = None
        super().__init__(**kargs)

Override the load_model method

The load_model method is in charge of building the model to be serviced from the file or files in which it has been previously saved. This method is executed when the object is created. The object constructor will search in the corresponding OSP ontology for the appropriate model and download it from the OSP File Repository to a local directory. This download will contain exactly the files and/or directories that were created at the time the model was saved (see train method).

The following two parameters are passed to the load_model method:

model_path: is the path to the local directory where the files and/or directories needed to load the model are located. The developer assumes that in that path she will find all the files and/or directories she created at the time she saved the model to be loaded. Therefore, she can now rebuild the model from those elements.
hyperparameters: this is a dictionary with all the hyperparameters that were used to train the model. They may be necessary for its reconstruction. In this example they will not be used.

Specifically, for the SentimentAnalysisModelService class, we can assume that the models are stored in two files: an h5 with the Tensorflow neural network and a pickle with the tokenizer object that preprocesses the text. Therefore, assume that these two files must be provided within the model_path directory:

model.h5
tokenizer.pkl

The neural network is stored in the model attribute, while the tokenizer is stored in the preprocessor attribute (both previously initialized in the init method). This is the code:

class SentimentAnalysisModelService(BaseModelService):
    """Service for models of Sentiment Analysis"""
    
    ...
    
    def load_model(self, model_path=None, hyperparameters=None):
        """Loads a previously trained model and save it a one or more object attributes"""
        model_path = model_path + '/model.h5'
        preprocessor_path = model_path + '/preprocessor.pkl'
        model = keras.models.load_model(model_path)
        preprocessor = pickle.load(open(preprocessor_path,'rb'))
        self.model = model
        self.preprocessor = preprocessor

Overwrite the train method

The train method is in charge of training the model. It is executed internally when the developer executes one of these methods, implemented in BaseModelService:

train_from_file_system: launches the training of a model from a dataset previously saved in the OSP File Repository.
train_from_ontology: launches the training of a model from a dataset stored in an OSP ontology.

The train method receives the following parameters:

dataset_path: is the local path to the file in which the training dataset is provided. This file can have its origin in a file previously stored in the File Repository. In such a case, it will have exactly the format of the saved file. If the origin of the file is an ontology, it will have been converted to a CSV with "," as delimiter and as many columns as there are fields in the ontology records.
hyperparameters: is a dictionary with the hyperparameters that were passed to the train_from_file_system or train_from_ontology methods at the time of launching the training.
model_path: is the path to the local directory where the system will store the files or directories in which the model will be saved once trained.

The developer must read the dataset from the local file provided in dataset_path. This will feed the training process. Once it is finished, the developer will have to save the resulting model in the directory indicated in model_path. Besides, the train method must return a dictionary with the model evaluation metrics that the developer considers necessary.

In the case of SentimentAnalysisModelTrain, the training dataset will be assumed to be a CSV with "," as delimiter. This dataset will contain two columns:

text: with the texts containing the opinions.
label: with a 1 for texts with positive opinion and a 0 for those with negative opinion.

We train a model with Tensorflow2.x. For the preprocessing of the texts we use the Keras tokenizer that converts each text to a vector of n positions representing a bag of words with tf-idf: each position of the vector denotes a word (always the same) with a numeric value (between 0 and 1) denoting how relevant that word is in the text in question. See the code:

class SentimentAnalysisModelService(BaseModelService):
    """Service for models of Sentiment Analysis"""
    
    ...
    
    def train(self, dataset_path=None, hyperparameters=None, model_path=None):
        """Trains a model given a dataset"""
        
        NUM_WORDS = hyperparameters['NUM_WORDS']
        BATCH_SIZE = hyperparameters['BATCH_SIZE']
        EPOCHS = hyperparameters['EPOCHS']
        
        dataset = pd.read_csv(dataset_path, sep='\t')
        texts = dataset["text"].tolist()
        labels = dataset["label"].tolist()
        
        tokenizer = text.Tokenizer(num_words=NUM_WORDS)
        tokenizer.fit_on_texts(texts)
        X = tokenizer.texts_to_matrix(texts, mode="tfidf")
        y = np.array(labels)
        
        model = Sequential()
        model.add(Dense(250, input_shape=(NUM_WORDS,)))
        model.add(Activation("relu"))
        model.add(Dropout(0.2))
        model.add(Dense(1))
        model.add(Activation("sigmoid"))

       model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["accuracy"])
       model.summary()
       model.fit(X, Y, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_split=0.25, verbose=2)
       
       evaluation = model.evaluate(x_test, y_test)
       metrics = {'loss': float(evaluation[0]), 'accuracy': float(evaluation[1])}
    
       nn_path = model_path + '/model.h5'
       preprocessor_path = model_path + '/tokenizer.h5'
       model.save(nn_path)
       pickle.dump(preprocessor, open(preprocessor_path,'wb+'), protocol=pickle.HIGHEST_PROTOCOL
       
       return metrics

Overwrite predict method

The predict method receives a parameter (input) with the list of inputs for which inference is to be made; it calculates the output according to the model, and returns it in a list. Specifically, for SentimentAnalysisModelService, the input is assumed to be a list of texts. The model of the “model” attribute is taken on one hand, and the text preprocessor of the “preprocessor” attribute (both initialized in init and instantiated in load_model) is taken on the othern; and then the input is processed with them. The results are returned.

class SentimentAnalysisModelService(BaseModelService):
    """Service for models of Sentiment Analysis"""
    
    ...
  
    def predict(self, inputs=None):
        """Predicts given a model and an array of inputs"""
        X = self.preprocessor.texts_to_matrix(inputs, mode='tfidf')
        y = self.model.predict(X)
        return y

Creating an object, training and predicting

Let's assume that the following items have been created in the OSP deployment of https://lab.onesaitplatform.com/:

A dataset as a CSV file with "," as a separator in the File Repository. The dataset will have two columns: text (with the texts) and label (with value 1 or 0, where 1 denotes that the text has positive sentiment and 0 denotes that the text has negative sentiment).
An ontology called SentimentAnalysisModels with the structure shown above.
A Digital Client associated to the previous ontology, called SentimentAnalysisDigitalClient.
The previously-described SentimentAnalysisModelService class.

With all this, the object sentiment_analysis_model_service of class SentimentAnalysisModelService is going to be created:

PARAMETERS = {
    'PLATFORM_HOST': "lab.onesaitplatform.com",
    'PLATFORM_PORT': 443,
    'PLATFORM_DIGITAL_CLIENT': "SentimentAnalysisDigitalClient",
    'PLATFORM_DIGITAL_CLIENT_TOKEN': "534f2eb845c746bd9a50cfab30273317",
    'PLATFORM_DIGITAL_CLIENT_PROTOCOL': "https",
    'PLATFORM_DIGITAL_CLIENT_AVOID_SSL_CERTIFICATE': True,
    'PLATFORM_ONTOLOGY_MODELS': "SentimentAnalysisModels",
    'PLATFORM_USER_TOKEN': "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJwcmluY2lwYWwiOiJianJhbWlyZXoiLCJjbGllbnRJZCI6Im9uZXNhaXRwbGF0Zm9ybSIsInVzZXJfbmFtZSI6ImJqcmFtaXJleiIsInNjb3BlIjpbIm9wZW5pZCJdLCJuYW1lIjoiYmpyYW1pcmV6IiwiZXhwIjoxNjE3ODI2NjkzLCJncmFudFR5cGUiOiJwYXNzd29yZCIsInBhcmFtZXRlcnMiOnsidmVydGljYWwiOm51bGwsImdyYW50X3R5cGUiOiJwYXNzd29yZCIsInVzZXJuYW1lIjoiYmpyYW1pcmV6In0sImF1dGhvcml0aWVzIjpbIlJPTEVfREFUQVNDSUVOVElTVCJdLCJqdGkiOiJmNGM2NDUzZC0xYTEyLTRkMGUtYTVlNy05ZmNlMDY4OTY1NDYiLCJjbGllbnRfaWQiOiJvbmVzYWl0cGxhdGZvcm0ifQ.Nz5cDvMjh361z4r6MMD2jUOpYSmUKVLkMThHDK0sg6o",
    'TMP_FOLDER': '/tmp/',
    'NAME': "SentimentAnalysis"
}

sentiment_analysis_model_service = SentimentAnalysisModelService(config=PARAMETERS)

The parameters passed to the object are the following ones:

PLATFORM_HOST: Host of the OSP deployment on which we will work. In this case, lab.onesaitplatform.com.
PLATFORM_PORT: Port where the OSP is served.
PLATFORM_DIGITAL_CLIENT: Name of the Digital Client created in OSP to give access to the ontologies.
PLATFORM_DIGITAL_CLIENT_TOKEN: Authentication token corresponding to the Digital Client.
PLATFORM_DIGITAL_CLIENT_PROTOCOL: Protocol under which communications with OSP will be established.
PLATFORM_DIGITAL_CLIENT_AVOID_SSL_CERTIFICATE: True if connections without certificate are to be established.
PLATFORM_ONTOLOGY_MODELS: Name of the ontology where the different versions of the model created will be registered.
PLATFORM_USER_TOKEN: Authentication token of an OSP user.
TMP_FOLDER: Local directory to be used as local address to which the OSL File Repository elements will be temporarily downloaded; and in which the models will be temporarily stored before being uploaded to the File Repository.
NAME: Name of the model service.

Once the sentiment_analysis_model_service object has been created, it is ready to train versions of the sentiment analysis model as defined in the SentimentAnalysisModelService class. Besides, at the time the object is created, if the OSP ontology referenced in PLATFORM_ONTOLOGY_MODELS already contains any model in active state (active True), it will be loaded into memory and made available for use via the predict method.

To train a version of the model, one of these two methods can be executed:

train_from_file_system: launches the training of a model from a dataset previously stored in the OSP File Repository.
train_from_ontology: launches the training of a model from a dataset stored in an OSP ontology.

The following code launches the training of a model from a dataset previously uploaded to the OSP File Repository:

MODEL_NAME = 'sentiment_analysis'
MODEL_VERSION = '0'
MODEL_DESCRIPTION = 'First version of the model for sentiment analysis'
DATASET_FILE_ID = '605360b7cfb6d70134a3b1a0'
HYPERPARAMETERS = {
    'NUM_WORDS': 10000,
    'BATCH_SIZE': 16,
    'EPOCHS': 10,
    'DROPOUT': 0.2,
    'LEARNING_RATE': 0.001,
}

sentiment_analysis_model_service.train_from_file_system(
    name=MODEL_NAME, version=MODEL_VERSION, description=MODEL_DESCRIPTION,
    dataset_file_id=DATASET_FILE_ID, hyperparameters=HYPERPARAMETERS
)

Bear in mind that the value of DATASET_FILE_ID is the identifier of the file containing the dataset in the OSP File Repository.

Once a model version has been successfully trained, it will be stored in the OSP File Repository, and will be registered in the SentimentAnalysisModels ontology. Bear in mind that model versions are saved as active False. One of the versions must be activated in order to be available when creating a new SentimentAnalysisModelService object.

Once there is an active version of the model in SentimentAnalysisModels, whenever a new instance of SentimentAnalysisModelService is created, it will have this model loaded and available to be used in inference by means of the process method:

sequences = ['This is very good opinion', 'This is a very bad opinion']
results = sentiment_analysis_model_service.predict(inputs=sequences)

Activating the preferred model using Query Tool

As explained above, the training process registers the trained model, in this case in the SentimentAnalysisModels ontology. Firstly, the registration is set to active False. The model that is loaded for inference is the only record in SentimentAnalysisModels that is registered as active True. To manage the version of the model that is currently active, you can use the OSP Query Tool. This tool is accessed from the control panel, in the menu on the left margin:

You will see the form by means of which you have to select your SentimentAnalysisModels ontology:

Once the ontology is selected, a tool opens that allows you to make SQL queries to the ontology. This tool allows you to observe the available model versions and to update one of them as active True.

Once the desired model version is active True, when creating a new SentimentAnalysisModelService object, it will load this model’s version in memory, making it available for inference through the predict method.