EN | ES
In this tutorial, we are going to explain how the FlowEngine domains' monitoring works. Three functionalities have been added to help us monitor and recover our domains in case of error.
Monitoring endpoint (healthcheck).
A helthcheck endpoint has been created for each domain, in which its state will be shown. The endpoint call should be done as follows:
(internal) http://flowengineservice:5050/<domain_name>/health
(external) http://<instalation_url>/nodered/<domain_name>/health
This helthcheck service will show us the following information in a json format:
{
"cpu": 1.4831932773109242,
"memory": 117186560,
" sockets": [
"node 12884 rtvachet 11u IPv6 263833 0t0 TCP *:28001(LISTEN)",
"node 12884 rtvachet 12u IPv6 263870 0t0 TCP localhost:28001->localhost:58338 (ESTABLISHED)",
"node 12884rtvachet 13u IPv6 262758 0t0 TCP localhost:28001->localhost:59326 (ESTABLISHED)"
]
}
CPU use
Memory use
Information about the state of each socket
Automatic recovery in case of error.
A new property in the domain creation/edition has been added. It allows the users to activate a process that detects if a domain has stopped running. If so, the domain will be automatically restarted. To activate it, we need to follow this steps:
Select the "My Digital Flows" option from the DEVELOPMENT menu:
Select the “edit” option
The property “Reboot on failure“ will now appear:
By checking this property, we will indicate the platform to reboot this domain every time it stops running because of a failure. The average recovery time it takes the domain to be operative after a failure is 30 seconds.
Two new properties to control the amount of reboots given a specific temporary window have also been added. If the amount of reboots during this time interval is higher than the specified limit, the domain will remain inactive and the automatic reboot option will be disabled. The size of the window and the amount of reboots threshold will be provided by the following platform properties:onesaitplatform.flowengine.reboot.count.monitor.sec: Size of the temporary window (in seconds) in which the amount of reboots will be counted. The default value is 30 minutes.
onesaitplatform.flowengine.reboot.count.monitor.max: Maximum amount of reboots allowed during the period of time defined in the previous property. The default value is 10
Automatic control of the domain, based on the sockets and its states.
Sometimes a domain is active (running) but it is possible that its performance wouldn’t be as good as it should. New controls over the amount of sockets and its states have been added, so that a domain can be monitorized more presicedly. In the same domain’s edition screen we can specify the maximum allowed amount of sockets on each state .
Each limit is only active when the checkbox is checked. Whenever the amount of sockets in a given state are higher than the given limit, the domain will be automatically stopped.
If automatic reboot has also been selected, the domain will reboot after 30 to 60 seconds. This delay is necessary for ending the processes in the safest way possilbe.