Distributed Tracing

Available from version 5.1.0 of Onesait Platform (Survivor).

Goal

Aiming to incorporate new capabilities in Onesait Platform in the construction of MSA (Microservices Architecture), we have incorporated the distributed Tracing functionality.

This functionality allows a request to be traced from the moment it is generated, to its end, which is important in an architecture of this type in which a request can go through several microservices and modules.

In addition, a UI is included to be able to easily view the complete request, which can help diagnose problems, see bottlenecks, long times,...

How has this been supported on the Platform?

The image shows an example of distributed Tracing involving two microservices and several Platform components.

As shown in the image, the solution includes:

  • Open Telemetry Collector to automatically collect traces from all components.

  • Jaeger Collector to convert to exploitable trace.

  • DB OpenSearch to store tracing.

  • Jaeger UI to visualize tracing.

For both external and internal platform elements, the Open Telemetry agent or SDK is used for instrumentation, and to obtain the traces and send these to the Otel collector.

Basic concepts

  • Spans: Individual unit of work. These are closed time intervals, for example, a call to a service, or to a database.

  • Traces: Set of spans in a temporal sequence triggered by an initial action.

  • Scope: Formalizes where each span starts and ends.

  • Tags: Key value pairs with information that are used for queries, filters, and traces.

Open Telemetry

https://opentelemetry.io/ is a standard in this field, offering a set of standardized, vendor-neutral SDKs, APIs, and tools to ingest, transform, and push data to an Observability back-end.

Jaeger

Jaeger tracing collects the Open Telemetry traces through its collector, stores it in Open Search and allows to export it through the Jaeger UI, integrated into the Control Panel.

From the UI, you can make searches based on the service that has initiated the call to see the entire trace and check which components have navigated through, the elapsed time, and see information on each of these spans.