Technical Characteristics: Scalability, Robustness and Performance
The platform is designed and built to be scalable and robust and to offer high performance at both the processing and storage layers.
Scalability
The platform proposes Horizontal Scalability as the main scaling mechanism.
Horizontal scalability allows for scalable growth, without having to size the entire infrastructure from the initial moment, thus reducing platform acquisition and operation costs, and only having to assume the necessary cost at each moment in time.
Scaling is achieved through the deployment of the platform on Docker containers managed by Kubernetes, thus enabling dynamic scaling upon detection of a load peak.
Performance will remain stable regardless of the overall load, taking advantage of all available machine capacity without impact between different modules.
This horizontal scaling can be performed dynamically depending on the workload detected, so that, in the face of a peak workload, the system response can remain stable.
These machines added horizontally do not require high technical characteristics, allowing this growth to be achieved through the installation of what is known as "commodity hardware" (i.e., basic and low-cost hardware) that makes it possible to acquire several machines that are not very powerful, but whose sum provides the system with better performance and a much higher ROI than by performing continuous upgrades on existing machines.
Robustness
The platform is able to perform satisfactorily at all times, withstanding occasional high load scenarios without suffering any damage and ensuring its stability to provide continuous service.
To demonstrate these capabilities, examples of stress tests performed on the platform are included to give an idea of how it behaves in different scenarios.
High Load Scenario: This test focuses on loading the platform in a small time interval, thus obtaining the limiting and scaling-sensitive component.
During this test, it is shown that the bottleneck is given by the CPU as the degradation of the response time is observed at the same point where the CPUs are at 100% utilization. Considering this limitation as the least risk, as this is usually easier to scale than any other bottleneck. We can see the CPU utilization result in the following graph:
Stability Scenario: In the stability scenario a stable workload is spread over a time period of 8 hours with 50 simultaneous threads.
CPU performance remains stable during the stability test run, with a constant load remaining at 20%.
Rendimiento
The platform has been designed in a modular way, so that each component can scale horizontally (adding new nodes in parallel) distributing the load among all of them.
This capability is implemented in all layers of the platform, from the acquisition modules exposed to the outside to enable massive information ingestion, the streaming engines to enable massive real-time processing, to the persistence layer that takes advantage of its replication capabilities to distribute processing load among its nodes favoring response time.
The platform ensures performance in several ways:
At the architecture and design level, the platform has always selected technologies, frameworks and software widely used in critical environments (e.g. Spring, Kafka, Hazelcast, MongoDB).
This choice is complemented by performance tests for the critical parts in terms of platform execution.
Associated with the deployment of the platform, the performance test environment will be deployed and started at specific times to certify the performance of the platform.
The sizing of the platform is done according to the system load forecast. In addition, by basing the deployment on containers and Kubernetes, the scaling of the components to load situations is immediate, simply by provisioning more infrastructure (horizontal scaling) where to run the containers, or by increasing the capacity of the vertical infrastructure (vertical scaling).