Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Introduction

In this tutorial , we are going to explain how the FlowEngine domains' monitoring domain monitoring that has been added to FlowEngine works.

Three functionalities new features have been added to that will help us you monitor and recover our your domains in case of error.

...

Endpoint monitoring (healthcheck)

...

A helthcheck new monitoring endpoint has been created for in each domain, in which its state will be shownthrough which the domain’s status can be consulted. The endpoint call should be done as followsis made like this:

  • Internal URL (En in the CaaS cluster CaaS): http://flowengineservice:5050/<domain_name>/health

  • External URL (fuera del outside the CaaS cluster CaaS): http://<url_instalación>/nodered/<domain_name>/health

This helthcheck service will show us you the following information in a json format:

  • CPU usage.

  • Memory usage.

  • Information about the status of the different sockets..

{
"cpu": 1.4831932773109242,
"memory": 117186560,
" sockets": [
"node 12884 rtvachet 11u IPv6 263833 0t0 TCP *:28001(LISTEN)",
"node 12884 rtvachet 12u IPv6 263870 0t0 TCP localhost:28001->localhost:58338 (ESTABLISHED)",
"node 12884rtvachet 13u IPv6 262758 0t0 TCP localhost:28001->localhost:59326 (ESTABLISHED)"
]
}

  • CPU use

  • Memory use

  • Information about the state of each socket

Automatic recovery in case of error

...

The possibility of activating a new property that detects when a domain has stopped running and restarts it, has been added to the ControlPanel. Whenever that happens, the domain will be automatically restarted.

To activate it, we need to follow this these steps:

  1. Select the "My Digital Flows" option from the DEVELOPMENT menu:

     

  2. Select the new option, “edit”:

    Image RemovedImage Added


  3. The new property “Reboot on failure“ will now appear:

    By checking this property, we


Checking this box will cause

...

the

...

domain to reboot if at any time it stops running due to

...

any failure. The average

...

time

...

from domain

...

failure to its recovery is about 30 seconds.

...

Additionally, a control has been added that counts the

...

number of reboots given a

...

time window. If the amount of

...

reboot in a domain exceeds the threshold established for said time window, the domain will remain

...

stopped, and the

...

check will be

...

automatically deactivated. The window size

...

and restart threshold are defined in the following platform-level properties:

  • onesaitplatform.flowengine.reboot.count.monitor.sec: Size of the

...

  • time window (in seconds) in which the amount of reboots will be counted. The default value is 30 minutes.

  • onesaitplatform.flowengine.reboot.count.monitor.max: Maximum amount of reboots allowed during the

...

  • time window defined in the previous property. The default value is 10 reboots.

Automatic domain control

...

based on the

...

number of sockets in a state

Sometimes a domain is active (running) but it is possible that its performance would may not be as good as it should be. New controls over the amount desired. In order to monitor domains more accurately, a numbers of controls have been added on the number of sockets and its states have been added, so that a domain can be monitorized more precisely. In the same domain’s edition screen we can specify the maximum allowed amount of sockets on each their states. From the domain editing screen itself, you can select the maximum number of sockets, either in total or in a specific state.

...

Each limit is only active when the checkbox is checked. Whenever the amount of sockets in a given state are higher than the given
The filters will be active only if you select the checkbox associated with each state. If at any time the number of sockets in the indicated state exceeds the established limit, the domain will be automatically stopped.
If automatic reboot restart has also been selected, the domain will reboot restart after 30 to 60 seconds. This delay period of time is necessary for ending the correct closure of the processes in the safest way possilbe.