EN | ES
In this tutorial, we are going to explain how the FlowEngine domains' monitoring works.
Three functionalities have been added to help us monitor and recover our domains in case of error.
Monitoring endpoint (healthcheck).
A helthcheck endpoint has been created for each domain, in which its state will be shown. The endpoint call should be done as follows:
Internal URL (En cluster CaaS): http://flowengineservice:5050/<domain_name>/health
External URL (fuera del cluster CaaS): http://<url_instalación>/nodered/<domain_name>/health
This helthcheck service will show us the following information in a json format:
{
"cpu": 1.4831932773109242,
"memory": 117186560,
" sockets": [
"node 12884 rtvachet 11u IPv6 263833 0t0 TCP *:28001(LISTEN)",
"node 12884 rtvachet 12u IPv6 263870 0t0 TCP localhost:28001->localhost:58338 (ESTABLISHED)",
"node 12884rtvachet 13u IPv6 262758 0t0 TCP localhost:28001->localhost:59326 (ESTABLISHED)"
]
}
CPU use
Memory use
Information about the state of each socket
Automatic recovery in case of error.
The possibility of activating a new property that detects when a domain has stopped running and restarts it, has been added to the ControlPanel. Whenever that happens, the domain will be automatically restarted.
To activate it, we need to follow this steps:
Select the "My Digital Flows" option from the DEVELOPMENT menu:
Select the new option, “edit”:
The new property “Reboot on failure“ will now appear:
By checking this property, we will cause that the platform reboot this domain every time it stops running due to a failure. The average recovery time it takes the domain to be operative after a failure is 30 seconds.One control that counts the amount of reboots given a specific temporary window has also been added. If the amount of reboots during this time interval is higher than the specified threshold, the domain will remain inactive and the automatic reboot option will be disabled. The size of the window and the amount of reboots threshold will be provided by the following platform properties:
onesaitplatform.flowengine.reboot.count.monitor.sec: Size of the temporary window (in seconds) in which the amount of reboots will be counted. The default value is 30 minutes.
onesaitplatform.flowengine.reboot.count.monitor.max: Maximum amount of reboots allowed during the period of time defined in the previous property. The default value is 10 reboots.
Automatic control of the domain, based on the sockets and its states.
En ciertas ocasiones un dominio está activo (en ejecución) pero es posible que su rendimiento no sea el deseado. Para poder monitorizar de manera más precisa los dominios se ha añadido una serie de controles sobre la cantidad de sockets y sus estados. En la misma pantalla de edición del dominio podemos marcar la cantidad máxima de sockets en total o en algún estado en concreto.
Sometimes a domain is active (running) but it is possible that its performance would not be as good as it should be. New controls over the amount of sockets and its states have been added, so that a domain can be monitorized more precisely. In the same domain’s edition screen we can specify the maximum allowed amount of sockets on each state .
Each limit is only active when the checkbox is checked. Whenever the amount of sockets in a given state are higher than the given limit, the domain will be automatically stopped.
If automatic reboot has also been selected, the domain will reboot after 30 to 60 seconds. This delay is necessary for ending the processes in the safest way possilbe.