Data Governance: Data Classes

Introduction

This functionality allows you to associate both Entities (at a general level) and their attributes, rules that allow you to check the integrity of a data and normalize the data to the data entry in a simple way. We have called this functionality Data Classes.

How to use the Data Class

DATACLASS centralized configuration

Tules are defined as a new type of centralized configuration, DATACLASS, where validation rules and data format can be defined, thus allowing preprocessing to be executed before inserting the data.

This preprocessing will allow you to define error messages that will be inserted into the user's audit entity, thus facilitating the visualization of failed data and showing load statistics on a dashboard.

To create the rules, it will be necessary to follow the following format when creating the configuration file:

Rule types

As can be seen, there are two types of rules, the property, which will be directly related to a field/property of the entity, and the entity, which will be rules that relate different properties of an entity.

Within each rule, there are two types, validation rules and format change rules:

  • Changes. These rules are for changing or correcting the format, for example, converting to uppercase, to lowercase, correcting a text... Therefore, it will be necessary to indicate the order of execution. The fields to be defined will be:

    • name: name of the format change.

    • order: the order of execution.

    • script: The JavaScript or Groovy code to process the change. It will be necessary to say in the first place the language used, Groovy or JavaScript, and you must bear in mind that value will be the nomenclature for the value to change.

There is the option to dispense with the use of JavaScript/Groovy code, as long as the change is a simple condition/effect type. To do this, the condition, effect and else fields must be filled in, the latter if necessary, instead of the script.

  • Validations. These rules will normally return true or false, depending on whether the condition is met or not, so the type of error and the error message should be indicated if the necessary condition is not met. The fields to fill in will be:

    • name: name of the validation.

    • script: JavaScript or Groovy code to execute to validate the data. It will be necessary to say in the first place the language used, Groovy or JavaScript, and it must be taken into account that rawdata will be the nomenclature for the insertion JSON.

    • error: the type of error. It can be error if you want to interrupt the insertion of the data, or warning if you want to continue inserting the data even if it fails.

    • errormsg: message that you want to appear when the validation rule is not met. You can show the value of the property to be edited or validated, with ${value} in the case that the rule is of the property type, or ${rawdata.dni} in the case that the rule is of the entity type (DNI or the desired property of the insert JSON).

How to associate a DataClass to an Entity

To use these rules, you have to associate them to the entity in which you want to do this preprocessing, so when an entity is created or edited, there will be varias new options:

  • A check to enable or disable rule preprocessing:

  • A multiple selector where you can choose the existing rules at the ontology level:

  • A multiple selector for each property, where you can select the property rules that exist:

In this way, when we insert data into the ontology, the format changes will be executed and the validation rules will be checked before proceeding to the insertion.

Audit

All errors occuring when executing a validation rule, will be inserted into the user's audit ontology (Audit_UserName).

Among all the fields that exist when saving audit errors, we will highlight the following:

  • errorMessage: indicates that the error has occurred during the insertion, and then shows the message that has been entered in the dataclass where that validation is defined.

  • methodName: will always be dataClassError, thus being able to identify all data preprocessing errors with the dataclass.

  • type: indicates whether it is an error or a warning.

  • formatedTimeStamp: indicates the date and time when the error occurred.

  • user: the user who inserted the data.

  • ontology: the target ontology where the insertion was being made.

Â