EN | ES
Table of Contents |
---|
In this tutorial, we are going to explain how the FlowEngine domains' monitoring works.
Three functionalities have been added to help us monitor and recover our domains in case of error.
Monitoring endpoint (healthcheck).
A helthcheck endpoint has been created for each domain, in which its state will be shown. The endpoint call should be done as follows:
Internal URL (
...
En cluster CaaS): http://flowengineservice:5050/<domain_name>/health
...
External URL (fuera del cluster CaaS): http://
...
<url_
...
instalación>/nodered/<domain_name>/health
This helthcheck service will show us the following information in a json format:
{
"cpu": 1.4831932773109242,
"memory": 117186560,
" sockets": [
"node 12884 rtvachet 11u IPv6 263833 0t0 TCP *:28001(LISTEN)",
"node 12884 rtvachet 12u IPv6 263870 0t0 TCP localhost:28001->localhost:58338 (ESTABLISHED)",
"node 12884rtvachet 13u IPv6 262758 0t0 TCP localhost:28001->localhost:59326 (ESTABLISHED)"
]
}
CPU use
Memory use
Information about the state of each socket
Automatic recovery in case of error.
A new property in the domain creation/edition has been added. It allows the users to activate a process that detects if The possibility of activating a new property that detects when a domain has stopped running . If soand restarts it, has been added to the ControlPanel. Whenever that happens, the domain will be automatically restarted.
To activate it, we need to follow this steps:
Select the "My Digital Flows" option from the DEVELOPMENT menu:
Select the new option, “edit” option:
The new property “Reboot on failure“ will now appear:
By checking this property, we will indicate cause that the platform to reboot this domain every time it stops running because of due to a failure. The average recovery time it takes the domain to be operative after a failure is 30 seconds.Two new properties to control One control that counts the amount of reboots given a specific temporary window have has also been added. If the amount of reboots during this time interval is higher than the specified limitthreshold, the domain will remain inactive and the automatic reboot option will be disabled. The size of the window and the amount of reboots threshold will be provided by the following platform properties:
onesaitplatform.flowengine.reboot.count.monitor.sec: Size of the temporary window (in seconds) in which the amount of reboots will be counted. The default value is 30 minutes.
onesaitplatform.flowengine.reboot.count.monitor.max: Maximum amount of reboots allowed during the period of time defined in the previous property. The default value is 10 reboots.
Automatic control of the domain, based on the sockets and its states.
En ciertas ocasiones un dominio está activo (en ejecución) pero es posible que su rendimiento no sea el deseado. Para poder monitorizar de manera más precisa los dominios se ha añadido una serie de controles sobre la cantidad de sockets y sus estados. En la misma pantalla de edición del dominio podemos marcar la cantidad máxima de sockets en total o en algún estado en concreto.
Sometimes a domain is active (running) but it is possible that its performance wouldn’t would not be as good as it should be. New controls over the amount of sockets and its states have been added, so that a domain can be monitorized more presicedlyprecisely. In the same domain’s edition screen we can specify the maximum allowed amount of sockets on each state .
...