How to use the Onesait CRUD stage

Description

Dataflow's Onesait CRUD stage performs CRUD-like operations on ontologies.

This stage allows, from a source that issues change records (CDC, Change Data Capture), to replicate them in one ontology or in several ontologies. It is based on the MySQL CDC source (MySQL Binary Log), meaning that, for other sources, it may be necessary to adapt the log format to them. Bear in mind that a fundamental part of the CRUD process is the order of operations, with which you will not be able to use execution threads (lest you lose synchrony), and the only operation that can be grouped is the insert, so you must bear this in mind when working with the component, and understand that its working speed is not going to be as high as other components. If more processing speed is needed, the change transfer process (CDC (source) - CRUD (destination)) can be functionally divided into several dataflows, where each one acts on a set of functionally independent tables, and thus can work in parallel about those sets.

The work format will have the following particularities:

  • The type of operation must go in the attribute field of the sdc.operation.type header, which determines what will be done with the current record.

  • The operations allowed are:

    • 1 → Insert: the content of the “/Data” field is taken as the insertion record. The operation of the previous point is required, and the base format would be the following. If there were more fields in the record, those would be ignored.

    • 2 → Delete: A delete query is built and executed based on the content of the “/OldData” field. There are two ways of working:

      • If the destination entity appears in the Entity Key Fields configuration, then that indicated key will be used, using the registry values (content of the “/OldData” field) as the unique key for deletion (The where part of the delete operation is built based on it). This key must be unique and it is recommended that it be an index on the target entity to speed up the query.

      • If the entity does not appear in the Entity Key Fields configuration, all the fields of the record (content of the “/OldData” field) will be used as unique key for deletion (the where part of the delete operation is built based on it). As in the previous point, this key must be unique and it is recommended that it be an index in the destination entity to speed up the query.

    • 3 → Update: builds and executes an update query based on the content of the “/OldData" field (where section) and the “/Data” field (set section). In the set section, all the fields of “/Data" will be used. As in the delete, there are two ways of working:

      • If the destination entity appears in the Entity Key Fields configuration, then that indicated key will be used, using the registry values (content of the “/OldData” field) as the unique key for the update’s where field. This key must be unique and it is recommended that it be an index on the target entity to speed up the query.

      • If the entity does not appear in the Entity Key Fields configuration, all the fields of the record (content of the “/OldData” field) will be used as unique key for the where section. As in the previous point, this key must be unique and it is recommended that it be an index in the destination entity to speed up the query.

  • The destination entity and the root node have several configurations:

    • If the value of ontology is different from $$recordheader, the rules of any other destination node apply for both the destination ontology/entity and the root node.

    • If the value of ontology is $$recordheader, the destination ontology will be given by the value of the op.ontology header attribute, with which each record can act on a different entity and thus be able to receive records from a cdc that generates changes from multiple origins. Bear in mind that it is mandatory for the digital client to have access to all ontologies/entities in writing mode. Besides, the value of the root node must be taken into account, which will be:

      • If the value is “No root node” or “Custom root node“, all the fields of the operations in destination will either have no root node, or its value will be given in “Custom root node name“. This applies to insert, update or delete.

      • If the value is “Add ontology as root node“, all the fields of the operations will be preceded by the name of the ontology/entity of the op.ontology field.

How to use it

Importing to the pipeline

To add it to the pipeline, look for Onesait in the stages palette like this:

Drag and drop in the panel.

Once imported, you can configure it to achieve different behaviors on a case-by-case basis.

available configuration tabs

General

This tab displays general information and an option to display the event spawner point on the stage.

By checking the option "Produce Events", the scenario shows another point to make use of the events.

The triggered events are (entity/ontology and operation are added):

  • document-modified

    • Which sends the records in a list (inserted or changed data).

  • document-not-modified

    • Which sends the uninserted/modified records in a list (uninserted/modified data).

For example, modified data:

 

And unmodified data:

 

Connection

This tab shows a form to complete with the information of the Digital Broker of the Onesait platform.

The root node option can add registry data to a root property before inserting it. The special value of the ontology field $$recordheader is important in this case, since it will indicate that this value comes from the record itself in the header attribute “op.ontology“ that will allow to have a multiple destination.

Configuration

This tab shows the main configuration.

 

  • Root node type: it is defined if the entity/ontology has a custom root node, the one proper to the name of the entity, or does not have it

  • Bulk: number of records per batch (only for insert operations).

  • Generate error when no records are modified: defines whether an error record is generated when the update or delete operation has not generated any change, or not.

  • Entity Key fields: the entities/ontologies are defined on the left, and the key fields for the update and delete operations (which come in the record in “/OldData”) are defined on the right. It is important that they are key and index in the destination entity, to avoid errors and to speed up operations.

Debug mode in operations

In order to debug the operations in detail, you can activate the debug mode (only for tests or development) by adding in the Log Config part (Log4J configuration inside the log viewer):

log4j.logger.com.minsait.onesait.platform.streamsets.destination.OnesaitplatformCRUD=DEBUG