- As a data origin, we will use the "London DataStore" (https://data.london.gov.uk/ ). We have searched for the total population of the cities with more than 300,000 habitants, "Global City Population Estimates."
To Download the historical population dataset: https://data.london.gov.uk/download/global-city-population-estimates/604a6a6f-2162-4d6b-bcd0-bee051703de1/global-city-population-estimates.xls
This is the structure of the downloaded dataset:
An interesting tab is titled "CITIES-OVER-300K". For every city, we have a row with the population per year:
The next step will be defining the necessary ontology (data model) to load and validate the information.
2. To create the ontology, go to the ControlPanel (https://www.onesaitplatform.online/controlpanel/) and then create the ontology.
In the creation interface, you can name the ontology and provide a little description:
Now you can use a template. In our case, we will use an empty template called "EMPTYBASE". We are going to represent the information per year and population instead of using a register per city. This will facilitate the temporal series management and the data insertion in the future. As a result, every register in the Excel file will be transformed into different registers (one per year):
Once this is defined, press "Update Schema" to create the JSON scheme to validate the data.
Below the JSON scheme, by clicking on "Generate Instance", an example of a valid data can be seen.
Once finished, press "NEW" to close and save changes.
3. Accessing to the FlowEngine, you can create the process flow "OpenData CityPopulation" that will convert the DataSet into the required ontology. The flow has four parts:
- External REST service invoke to get the data: opening the "http request" node, you can put into the "URL" field the URL address of the Excel file. It is important to check on the "return a binary buffer" option:
- Data transformation from Excel to JSON: The binary buffer must be converted into an "XSL Book" node that will work with it as an XSL document. The next step will be selecting the worksheet: "CITIES-OVER-300K."
The last step will be selecting the cell set to process and the output format. In the example, we will use the entire document from A0 (first cell) to the X1693. We will also indicate that we want the output with raw values, to keep the data type from the Excel file in the JSON.
The output of the "sheet to json" will be a JSON with the registries of the datasheet.
- Information transformation: Here you will transform the JSON into the registries to insert in the ontology. This will be made with the "Split to cities" node by creating a message per every combination of registry and year. The code is:
At the end, we will have in msg.payload a JSON with the same structure of the ontology instance.
- "CityPopulation" ontology insertion:
4. Once finished, press "Deploy" to save the flow.
5. To check the data into the ontology, select "TOOLS → Query Tool" in the menu.
Then select the "CityPopulation" ontology:
To search any register, you can write the query "select * from CityPopulation limit 1":
Each of the inserted values into the defined fields can be seen.
This tutorial continues with the representation of the data at (Dashboards) Visualization of Data loaded into the Platform from an Open Data Portal.