Disponible a partir de Release 5.3.0 (Ultimate) de Plataforma

Goal

This feature allows you to monitor the individual and overall consumption of the notebooks. In this way we can know the status of each Notebook, see the running processes, control the status...

Operation of the Notebooks

To understand the functionality of Monitoring it is important to know some concepts of Notebooks.

Notebook execution modes

The platform notebooks (based on Apache Zeppelin) are executed based on interpreters with different configurations, so that a notebook can execute interpreters in different modes.

There are 3 modes of execution of interpreters in notebooks:

Shared: The interpreter process is shared with all notebooks, so that parallel executions of this interpreter cannot be made in several notebooks. The manager is the same for this interpreter. In these cases, since the interpreter is not associated to a notebook, it will not be possible to know in a simple way which notebook has been executed, since it can jump from one to another and the resources entity will have to be crossed with the executions entity in order to know the details.
Per notebook:
- Scoped: The interpreter process is common to all notebooks so it is a multi-run manager.
- Isolated: The interpreter process is also separated by notebook so that the manager only handles one notebook. In this case, the interpreter will be associated to a notebook, so you will be able to know which notebook it is at all times by the name of the interpreter. If you want to know the detail of paragraphs, you will have to cross the resources entity with the executions entity.

In addition, there are execution modes in k8s so that the execution of each notebook is delegated to each pod. The manager is kept in this pod to control the various types of executions.

Based on this, the manager (RemoteInterpreterServer process) will be in charge of reporting metrics and execution information to the platform, regardless of where it is executed.

Métricas disponibles

Se han creado 2 métricas, ambas complementarias:

Métricas de recursos

Esta monitorización, almacenada en una entidad de tipo TimeSeries (notebooks_metrics_resources). A nivel intérprete, se sacan los procesos, tipo de intérprete (shared, scoped, isolated), si está asociado a un notebook y el consumo de CPU y RAM

Tiene un reporte periódico (configurable a nivel pod del módulode notebooks), por defecto, estará a 10 segundos.

En intérpretes “shared“, será necesario cruzar con la entidad de monitorización para saber que notebook ha consumido el intérprete.

Métricas de ejecución

Estas métricas (notebooks_metrics_executions) dan el detalle de ejecución de los párrafos que se ve el usuario, notebook, párrafo, intérprete...

Esta monitorización hará de “history“ de ejecuciones, se almacenará en una entidad propia y podrá ser desactivada si no se cree necesaria.

Con esta monitorización, cruzada con la anterior, podremos saber el consumo real por párrafo.

Reporte de Métricas

Para el reporte de métricas existen dos métodos:

Reporte push desde intérprete → a traves de estas variables de entorno (incluidas en el zeppelin-env.sh) se configura el acceso, vía digital client, a dos entidades sobre las que se insertarán las métricas anteriores.

#### Monitor reporter zeppelin onesait platform ####
export ZEPPELIN_INTERPRETER_MONITORREPORTER_ENABLE=true
export ZEPPELIN_INTERPRETER_MONITORREPORTER_DIGITALCLIENT_HOST=https://development.onesaitplatform.com/iot-broker
export ZEPPELIN_INTERPRETER_MONITORREPORTER_DIGITALCLIENT_NAME=notebook_metrics_client
export ZEPPELIN_INTERPRETER_MONITORREPORTER_DIGITALCLIENT_INSTANCE=notebook_metrics_client_interpreter
export ZEPPELIN_INTERPRETER_MONITORREPORTER_DIGITALCLIENT_TOKEN=XXXXXXX
export ZEPPELIN_INTERPRETER_MONITORREPORTER_ENTITY_RESOURCES=notebook_metrics_resources
export ZEPPELIN_INTERPRETER_MONITORREPORTER_ENTITY_EXECUTIONS=notebook_metrics_executions

Reporte desde API Rest de zeppelin → a través de un nuevo api creado (tipo actuator) se puede conocer el consumo de todos los intérpretes (métrica de recursos). En este caso, no es posible obtener la métrica de ejecución al depender en si de la temporalidad de la misma.

Se disponen de varios endpoints:

/api/interpreter/metrics/all → obtener todos los recursos de los intérpretes de zeppelin así como su estado y consumo

/api/interpreter/metrics/running → obtener todos los recursos de los intérpretes arrancados de zeppelin así como su estado y consumo

/api/interpreter/metrics/notebook/{notebookId} → obtener todos los recursos de los intérpretes de zeppelin para el notebook parametrizado así como su estado y consumo

/api/interpreter/metrics/running/notebook/{notebookId} → obtener todos los recursos de los intérpretes arrancados de zeppelin para el notebook parametrizado así como su estado y consumo

/api/interpreter/metrics/interpreter/{interpreterId} → obtener todos los recursos del intérprete por id (python, spark, onesaitplatform, …) así como su estado y consumo

Próximos pasos

Tener controles sobre los mismos en plataforma (UI de los notebooks) → poder usar los elementos anteriores en la UI de los notebooks para conocer los activos, poder pararlos de forma sencilla, etc, etc
Dashboard de visualización de métricas de forma sencilla
Limitar el uso de procesos de notebooks tanto por RAM como por CPU

Monitoring in Notebooks Engine