/
Saving Entity data using S3

Saving Entity data using S3

Introduction

In many projects it is very common to want to save historical data stored in Entities, so that later the size of the data in the databases can be reduced. An example would be to read data from Entities and store them as files in S3. For example, in this way, you could have backups of Entity data in S3.

In this post we are going to see useful examples with the Onesait Platform Dataflow module to implement all these use cases.

The goal is not to give a step by step guide on how to replicate the example. Here we will highlight the most relevant configurations in each pipeline. Let’s start with how to save data from Entities to S3.

Examples

Writing to S3

Let’s start with the simplest example, which would be reading the data and writing directly using S3.

image-20250220-175635.png

In this first approach, what we do is to define a connector with Onesait Platform as the source. This connector will execute a query to read the data you want to store. For example, you might want to read all the data prior to a certain date.

Once the data is read, we will use the Amazon S3 component to store the data using S3. Although the component is called Amazon S3, you can use other storage that supports S3. Specifically, in this example we use the MinIO integration we have in Onesait Platform.

To configure the Onesait Platform connector, we have to take into account mainly the following:

  • In the Connection tab, we have to fill in the values ‘Token’ and ‘IoT Client’ with the values of the Digital Client we are going to use, taking into account that this Digital Client must have permissions on the Entity from which we will take the data.

  • In Ontology we will have to put the name of the Entity chosen for this process, as shown in the previous image, and where we can make use of the following parameters

  • In the Configuration tab we must write the query that we will use to obtain the data from the Entity.

To configure the S3 destination the most relevant data are:

image-20250220-175651.png

One thing to keep in mind with this example is that, because of the way S3 works, with this configuration a new file will be created for each batch of data read from the source Entity. If the Entity contains little data, you could read all the data at once and generate a single file or use a batch size large enough to generate an acceptable number of files in S3. However, this is quite limited if you have a large amount of data.

This leads us to the logical evolution of this example, which is to have an intermediate step that creates the files with the size we are interested in to store them in S3, regardless of the batch size, which will even allow us to work in streaming if necessary.

To do this, we are going to split the data flow into two flows. The first one will read from Onesait Platform and create temporary files with the data. The second one will read the temporary files and save them in S3. In addition, this second data stream will delete the file once it is in S3.

The first stream will look like this:

Without going into the configuration details, what is relevant here is to choose the file size taking into account the file size we want to have in S3. The Local S3 destination allows you to define this value based on number of lines and file size. You can see that an event flow has been implemented to stop the pipeline. This is not mandatory and will depend on the use case. For this example, it has been done so that it stops automatically when it finishes executing the configured query, without waiting to listen for new records that may arrive later.

The second data stream is similar to the original, except that it reads from a local directory.

The most relevant in this case is that in all previous cases we were working at record level using JSON format. In this case it is no longer necessary. In this example in the local directory we have the temporary file already formatted as we want to upload it to S3 so we don’t need to parse all the lines again. To avoid this unnecessary processing, the wholefile format is used to read from the local directory and to write to S3.

In addition, to avoid having to manually manage temporary files, the post-processing option has been configured in the source directory to delete the files that have already been processed.

Writing to Onesait Platform reading from S3

This is a fairly simple example. The most important thing is to configure the Amazon S3 source to read the files you want and configure a destination for Onesait Platform. For example, this could be done to restore an entity based on files stored in S3.

Related content

IoT Workshop
IoT Workshop
More like this
Guardar datos de Entidades utilizando S3
Guardar datos de Entidades utilizando S3
More like this
Introduction to Entities
Introduction to Entities
More like this
Virtual Buckets on S3
Virtual Buckets on S3
More like this
Creation of an Entity in a historical database
Creation of an Entity in a historical database
More like this
How to use the Onesait CRUD stage
How to use the Onesait CRUD stage
More like this