Demostrator of ingestion, correlation and action on systems
Introduction
The objective of this demonstrator is the detection, notification and solution of problems in multiple remote systems mounted in a topological network (built in a graph entity/ontology) with the information given by these systems' raw logs.
Platform modules used
This demonstrator inovles several platform elements, such as:
The Dataflow module → with a large logical load, will be in charge of several tasks:
Actively load logs, normalize them and standardize a common output for them, with the aim of being homogeneously treatable, regardless of the source system.
Detect problems in the systems, considering the flow of information (processed logs) of them. With this detection, alerts of a detected problem will be generated.
Correlate the source of information with the rest of the systems through the topological network.
Act on the topological network itself in order to reflect the changes in it, through log events.
Kafka on Onesait Platform → gives us, when combining it with Dataflow, the possibility of scaling loads and parallelizable treatments, in an easy way, in addition to being the basis of a productive and robust charging system.
Graph entities/ontologies → with the ability to have entities on a graph database (Nebula Graph), a topological system has been modeled as another entity, having the possibilities of query (visualization of the network through a graph) or writing on this network (loading and real-time updating of the virtualized network).
FlowEngine → will be in charge of all the management of logical flows, alarms and events, with some functions such as:
Opening and updating tickets in third-party systems.
Release fixes for remote systems.
Dashboard Engine/Web Projects → where the complete application will be assembled and based on reusable components, both for viewing and managing alarms, logs, topological viewers, correctors, …
Flow architecture
The dataflow architecture can be seen in the following image. There is clearly a common treatment of alarm information that is supported by the system's own standardization of raw data and the event detection capacity.
The system logs (the origin seen on the left) go through different information flows that are standardized in the first instance, and apply different problem detectors. For example, we have the following flow (consisting of two dataflows communicated via Kafka) which, based on log ratios, detects excessive load problems in the system and generates alerts.
Focusing on the first of those, we see the treatment of the logs through different regular expressions that allow us to extract and structure the raw content.
We can also see the incident detector itself that generates alarms that are centralized in an alarm topic for them.
Another outstanding dataflow in the treatment of alarms, in which, among other things, the different alarms are correlated with the topological network in order to extract the maximum information from the system.
Web Application
Within the web application, there are different tools that allow the administration and complete visualization of all the elements of the system using the web project tools and the platform dashboards themselves.
Log viewers:
Alarm managers, with the possibility of running correctors and complete topological vision:
Trace analyzers: