Info |
---|
Available from version 3.0.0 |
In release 3.0.0, a new Dataflow image has been generated updating the Streamsets version from 3.10 to 3.18.1, which includes several improvements in the Streamsets library nodes, as well as bug fixes.
You can check the improvements from version 3.10 to 3.18 at the following link:
This new version includes the orchestrator library that will allow you to plan the execution of flows of the Dataflow instances that you have deployed in the platform. The library consists of the following nodes:
Cron Scheduler: This origin-type node periodically generates a record according to the schedule that is configured. For this schedule, a cron expression that can be manually entered or auto-generated in the node configuration UI is used.
Start Pipelines (origin and processor): These nodes allow to start one or more flows in parallel.
Wait for pipelines Processor: This processor type node waits for the flows it receives as input, to finish.
...
When upgrading the dataflow from a previous version to the new one, you may need to perform post-upgrade tasks, paying special attention to the following ones:
...
Upgrading of flows using legacy libraries
...
Revision of flows that process data from a MySQL database using JDBC nodes
...
Table of Contents |
---|
Introduction
For several years Onesait Platform has relied on Streamsets Data Collector (SDC) as the engine to implement the Onesait Platform Dataflow module. SDC was an open-source software with a low-code approach to develop and monitor data flows. During this time we have successfully used this technology in a multitude of projects and products. In mid 2021, Streamsets, the company behind SDC, changed its licensing policy and as of version 4.0, SDC is no longer open-source.
Due to this license change, Onesait Platform has created a fork of the SDC open-source repository, starting from version 3.23.0. From this release, the Onesait Platform team will carry out both corrections and new functionalities. This new product, derived from SDC, is called Onesait Platform Dataflow and will continue to be licensed under the Apache License 2.
In any case, Onesait Platform will continue to support Streamsets Data Collector for all those projects that prefer to purchase the Streamsets license. In addition, the pipelines defined in Onesait Platform Dataflow, being similar to those defined in SDC 3.23.0 version, can be imported into the SDC licensed versions using its pipeline upgrade capabilities.
The main motivations for this decision have been:
To avoid the large cost for many products of having to migrate to a new technology.
Continue with our open-source model without having to include a new license in the implementations, which would make us less competitive.
Greater control to improve integration with the rest of our modules.
What Onesait Platform Dataflow includes
As part of the maintenance of Onesait Platform Dataflow, several functionalities have been developed:
Scripts and descriptors needed to build Onesait Platoform Dataflow (OPD) images have been included in the repository.
Dataflow images have been published in the Onesait Platform Docker logs.
A new image has been created that allows deploying a repository with the component libraries for each OPD version. This allows to have all the libraries locally in the installations or in a centralized server and not having to download them from the Streamsets servers.
A new documentation site has been created so that there is no dependency on Streamsets' online documentation.
...
Internal access to Streamsets services, such as user logs, usage activities, etc., has been removed.
The user interface has been adapted.
...
New changes in the future
We are currently working on improvements to the management of Onesait Platform Dataflow instances in Kubernetes clusters, which is the reference deployment we use in Onesait Platform Dataflow. This enhancement includes distributed persistence of flows.
Other enhancements are being analyzed for the future:
Improved integration with Onesait Platform users.
Further adaptation of the user interface to Onesait standards.