A Look at the DataRefiner

Available in version 2.2.0-hyperblast

Introduction

With the aim of incorporating new Data Governance capabilities in the Platform, a new module has been included, called DataRefiner (also called DataCleaner), accessible from the ANALYTICS TOOLS menu entry.

 

The objective of this component is to "refine" the information that is loaded or extracted from the platform. For this purpose it allows:

  • An end user to load from a UI data from various places, for example from their own PC, from a URL or from information residing on the platform itself.

  • The tool allows data to be loaded in the main formats, including Excel, XML, JSON, CSV,...

  • The user can work with these data with an "Excel-like" interface to perform data profiling, including data cleansing, enhancement, restructuring or reconciliation.

  • The "refined" data can be downloaded as files or uploaded to the platform as ontologies.

 

This module is built on Open Refine, an open-source Java tool (BSD-3 license), more information about it here: La tecnología detrás del DataRefiner: Open RefineUNDEFINED

Module capabilities

The module includes:

  • Import of files in various formats and sources

  • Export of processed data to different formats

  • Import data from an Ontology: in this section we can connect to a platform instance, select a query and load this data into the tool:

 

  • Export data already processed (cleaned, added, ...) to an Ontology by choosing a Platform instance: working in platform JSON format, or also export it as a JSON file to local:

  • (IN ROADMAP) The possibility of applying transformations to a file manually and then automating the application of these same rules on other files (for example, it could work only with data from one month and then apply them to an annual file) through a DataFlow component:

 

  • User-level security: each user can see only their projects