/
How to use the DataRefiner to ingest a file?

How to use the DataRefiner to ingest a file?

Introduction

This platform module can be accessed by the ADMINISTRATOR and ANALYTICS roles, from the option

 

Upload file to DataRefiner

Once I access the module from the main page I will select the option

which will enable me the option to load a file:

For the file upload example I will use this:

https://www.iwight.com/documentlibrary/download/transparency-data-april-2013-csv

 So I'll select this file as URL and hit Next:

Transforming the loaded data

Once the file is loaded I will find a UI like this, from which I can do some transformations prior to loading the file in my project

For example I can ignore the first line if it has the headers or discard blank rows:

Or try transforming the columns to their type ... we are not going to do this because we will do it at a later stage.

We will conclude by creating the project:

Once the project is created I find this UI:

Where I can see the first records of the file / project, in this example we have 11285 records.

We also see that each column has a drop-down menu and that a box appears to the left to be able to go back and see the Facets.

If I go to the Amount column I can see the options I have: Facet, Filters, edit cells, columns, transpose, sort, reconcile.

 I'll start by saying that this column is of type numeric in Edit cells> Common transforms> to number

Now I will see the data formatted as numbers, however this does not mean that they are all numbers,

now I will apply a Facet by selecting Facet> Numeric Facet

In the left part I will see that I have 2244 data that is not numeric:

If I just leave the Non-numeric check in my table I will see 10 of these records. As you can see, the problem is that there is a comma to separate the thousands and the tool cannot transform it to a number:

So I can transform it with the tool and its GREL (Google Refine Expression Language) language. I will go to the Amount column, select Edit cells> Transform and in the transformation I will indicate value.replace (',', '')

If I apply the filter and apply Edit cells> Commons Transforms> To number I can see since all the data in my column is numeric.

I can apply other more advanced transformations, such as deriving a column from another applying transformations, for that I will put myself in the Date column, I will select Edit column> Add column based on this column

In this one I will apply the date format transformation:

value.toDate ('dd.mm.yy'). toString ('yyyy-mm-dd')

And then I'll apply Edit cells> Commons Transformations> To date

I can also combine data, reconcile it, ...

Extract transformations

A very interesting concept is that I can extract the transformations to be able to apply them on other data of the same type (the Excel of the following year in this example), this is done from Undo-redo / Extract

The result is a JSON that I can save:

And that from the Apply option I can paste and apply to other files:

Export data

Finally we will see how to export the data of my project. From the Export option I can export in numerous formats: