How to load Data into the Platform from an Open Data Portal?
Introduction
In this example you are going to load data into the Platform through the FlowEngine, connecting with a dataset from an Open Data Portal.
How to develop it
- As a data origin, you will use the "London DataStore" (https://data.london.gov.uk/). Search for the total population of the cities with more than 300,000 habitants, "Global City Population Estimates."
To download the historical population dataset: https://data.london.gov.uk/download/global-city-population-estimates/604a6a6f-2162-4d6b-bcd0-bee051703de1/global-city-population-estimates.xls
This is the structure of the downloaded dataset:
An interesting tab is named "CITIES-OVER-300K". For every city, you have a row with the population per year:
The next step will be defining the necessary ontology (data model) to load and validate the information.
2. To create the ontology, go to the ControlPanel (https://www.onesaitplatform.online/controlpanel/) and then create the ontology.
In the creation interface, you can name the ontology and provide a little description:
Now you can use a template. In this case, use an empty template called "EMPTYBASE". You are going to represent the information per year and population instead of using a register per city. This will facilitate the temporal series management and the data insertion in the future. As a result, every register in the Excel file will be transformed into different registers (one per year):
Once this is defined, press "Update Schema" to create the JSON scheme to validate the data.
Below the JSON scheme, by clicking on "Generate Instance", an example of a valid data can be seen.
Once finished, press "NEW" to close and save changes.
3. Accessing to the FlowEngine, you can create the process flow "OpenData CityPopulation" that will convert the DataSet into the required ontology. The flow has four parts:
- External REST service invoke to get the data: opening the "http request" node, you can put into the "URL" field the URL address of the Excel file. It is important to check on the "return a binary buffer" option:
- Data transformation from Excel to JSON: The binary buffer must be converted into an "XSL Book" node that will work with it as an XSL document. The next step will be selecting the worksheet: "CITIES-OVER-300K."
The last step will be selecting the cell set to process and the output format. In the example, use the entire document from A0 (first cell) to the X1693. Also, indicate that you want the output with raw values, to keep the data type from the Excel file in the JSON.
The output of the "sheet to json" will be a JSON with the records of the datasheet.
- Information transformation: Here you will transform the JSON into the records to insert in the ontology. This will be made with the "Split to cities" node by creating a message per every combination of record and year. The code is:
At the end, you will have in msg.payload a JSON with the same structure of the ontology instance.
- "CityPopulation" ontology insertion:
4. Once finished, press "Deploy" to save the flow.
5. To check the data into the ontology, select "TOOLS → Query Tool" in the menu.
Then, select the "CityPopulation" ontology:
To search any record, you can write the query "select * from CityPopulation limit 1":
Each of the inserted values into the defined fields can be seen.
This tutorial continues with the representation of the data at How to visualize data on Dashboards loaded from an Open Data Portal?