TimeSeries modeling on MongoDB
Introduction
Timeseries are managed in the platform at the ontology level, because they are a special type of data in which a set of measures or signals are grouped into one or several time windows, which allow them to be consulted and managed together.
In a Timeseries-type ontology, each insertion in the platform can be considered as a signal or measurement, and it must be accompanied by a timestamp, which allows grouping the value of the signal or measurement in its corresponding time window.
A TimeSeries-type ontology is characterised by the fact that its schema describes the structure of the instances received by the IoTBroker, but not the structure of the instances stored in the database engine. That is to say -and as we will see later-, the platform makes an internal management of the data received, to build a more efficient grouping of the different signals.
Concepts
When working with TimeSeries in the platform, you must bear in mind the following concepts when creating an ontology:
Timestamp: The exact moment in time of the measurement or signal, it is mandatory in all TimeSeries-type ontologies. In fact, the platform itself adds it transparently by default, so it is not necessary to add a Timestamp field.
Tags: These are the descriptor properties of a signal or measurement, but not the measurement itself. For example, in a TimeSeries-type ontology that receives data from a counter, the counter's identifier and its owner will be tags.
Fields: These are the properties that represent the value of a signal or measurement at a given time.
Auxiliary Data: In a TimeSeries-type ontology, there is also information that, although it is stored in the TimeSeries structure, is costly to retrieve in processing time, due to the structure with which that information is stored. For example, retrieving the last received value, which could be a commonly-requested piece of data, has a certain computational cost in a TimeSeries structure. To this end, the Auxiliary Data option is provided. It creates another ontology with the same identifier given to the TimeSeries-type ontology, along with the suffix _stats, so that, by accessing the <identifierOntology>_stats ontology, immediate access to such data is available.
Database Engine: The database that is used as a support to store the ontology time series is a very important aspect to consider, as the storage structure depends on it. In principle, the platform supports MongoDB, which already proposes a structure for this type of data that will be described in later points.
All these concepts can be found when creating a TimeSeries ontology:
The last important concept is the following one:
Time windows: A TimeSeries-type ontology can have different time windows. A time window is the unit of time in which the received signals or measurements are grouped. It can be:
Minutal.
Hourly.
Daily.
Monthly.
Once the window unit is selected, select the sampling frequency, that is to say, how often signals or measurements (samples) are stored in the window. The frequency can be (always in a unit lower than the one of the window):
Seconds.
Minutes.
Hours.
Days.
Months.
Also, as an aggregate function, you can indicate which policy to apply when multiple measurements or signals are received for the same time instant in the window.
Persistencia en MongoDB
MongoDB has been chosen as the reference database to store TimeSeries, although this is open to introduce new database management systems in the future.
A TimeSeries-type ontology whose physical support is MongoDB has the following features:
A collection is created in MongoDB to support data storage..
Whenever an instance of a TimeSeries ontology is received, its timestamp is retrieved and it is checked for each time window defined in the platform, if there is already a document in MongoDB for that signal and timestamp. To do this, a query is launched with all the tags with the value received in these, and the timestamp of each window. If a document exists, it is updated in the entry corresponding to the timestamp of the received signal. If it does not exist, a document is created with the signals of all the moments at null value, except for the moment to which the signal belongs.
Each Instance will have a document type in the collection with the following structure:
Where:
windowType: Indicates the type of window the document represents. Depending on the windows created from the control panel, it can take the following values:
MINUTES
HOURS
DAYS
MONTHS
windowFrecuency: Indicates the sampling frequency (1 measurement or signal every X time).
windowFrecuencyUnit: Indicates the sampling frequency unit. It is applied together with windowFrecuency. E.g.: 1 measurement or signal every 5 (windowFrecuency) seconds (windowFrecuencyUnit). May take the following values:
SECONDS
MINUTES
HOURS
DAYS
MONTHS
timestamp: Represents the window's grouping date. Depending on the value of windowType, it will have one of the following formats:
MINUTES: yyyy-mm-dd HH:mm:00:00Z
HOURS: yyyy-mm-dd HH:00:00:00Z
DAYS: yyyy-mm-dd HH:00:00:00Z
MONTHS: yyyy-mm-01 00:00:00:00Z
owner and assetId: Fields that were declared as Tags in the ontology (See image of the previous point).
values: Structure that contains each of the signals or measurements received in the time window that groups the document. The structure depends on the chosen time window, the frequency unit and the frequency with which signals or samples are received.
The first level element of values will be an object with the property "v", whose value will be another object with a structure that allows to easily retrieve and update a value, if its timestamp is known. To do this, depending on the type of window and the sampling frequency, a structure of objects is built where the first level is the type of window and the following ones depend on the sampling frequency, and whose keys are the number of day, hour, minute or second that applies.
For example: For a daily window with samples every second, this will be the structure:In a first level, you find the 24 hours of the day with keys from 0 to 23. For each of these levels, which represents an hour of the day: we have the 60 minutes in that hour, with keys ranging from 0 to 59, and again, for each of these 60 minutes: we have the 60 seconds, in this case with the value associated with each one.
This structure has the advantage of allowing easy access to any value by chaining: <day_number>.<hour_number>.<minute_number>.<second_number>.
In the collection, you will have a document for each combination of:
Tags whose value is equal.
Unique field.
Time window.
So each document contains all the tags, but only one field. If an ontology defines more than one field, you will see that as many documents are created as there are fields of this type.
Also, if instances are received where one or more tags change between them, each unique combination of tags is considered to belong to a different document.
We will explain this with the following example:
We create a new TimeSeries-type ontology called MeterBox01:
We add two tags to it:
Â
assetId
subassetId
We add two signals (fields):
Â
power
intensity
We create two windows:
Â
Hourly with signals every second.
Daily with signals every minute.
At this point, the platform has created the collection in MongoDB, but it is empty:
We send a signal through the IoT Broker:
We check the collection in MongoDB and we see that 4 collections have been created:This makes sense, because we have two time windows and two fields, so we have:
Daily Window with frequency 1 Minute for the intensity property:
Daily window with frequency 1 Minute for the power property:
Hourly window with frequency 1 Second for the intensity property:
Hourly window with frequency 1 Second for the power property:
We make another insertion, but this time changing one of the tags - specifically the subassetId to CUPS-2:Â
We query the ontology again and we see that 4 other documents have been added: which correspond to the two windows and two properties, for the combination assetId="CUPS" and subassetId="CUPS-2":
We then check that for any of the instances the value sent in the request has been entered at the correct time instant. For example, for the first instance:
{"TimeSerie":{ "timestamp":{"$date": "2019-06-12T00:00:00Z"},"assetId":"CUPS","subassetId":"CUPS-1","power":28.6,"intensity":2.5}}
we can see that it was sent for the date 2019-06-12T00:00:00Z
We look at the Time window with frequency 1 second for the intensity field:
We can see that the timestamp corresponds to the day 2019-06-12 at 00 hour:
And if we deploy values to what would be the position of the hour sent, in this case of the instance sent, minute 0, second 0, we can see that it has the value 2.5, which is the one that was sent: Next, we will update with a new signal but for the next second: We check that it has been entered at minute 0, second 1. Next, we will increment the time and send a signal for the date 2019-06-12T01:00:00Z Checking the collection again, we see that we have two new documents:
that, having changed the time, correspond to the time window with frequency 1 second, for the intensity and power properties of the tags with values assetId="CUPS" subassetId="CUPS-1":
where we can see that the timestamp field already belongs to time 01.