New version of Onesait Platform Dataflow

Introduction

For several years Onesait Platform has relied on Streamsets Data Collector (SDC) as the engine to implement the Onesait Platform Dataflow module. SDC was an open-source software with a low-code approach to develop and monitor data flows. During this time we have successfully used this technology in a multitude of projects and products. In mid 2021, Streamsets, the company behind SDC, changed its licensing policy and as of version 4.0, SDC is no longer open-source.

Due to this license change, Onesait Platform has created a fork of the SDC open-source repository, starting from version 3.23.0. From this release, the Onesait Platform team will carry out both corrections and new functionalities. This new product, derived from SDC, is called Onesait Platform Dataflow and will continue to be licensed under the Apache License 2.

In any case, Onesait Platform will continue to support Streamsets Data Collector for all those projects that prefer to purchase the Streamsets license. In addition, the pipelines defined in Onesait Platform Dataflow, being similar to those defined in SDC 3.23.0 version, can be imported into the SDC licensed versions using its pipeline upgrade capabilities.

The main motivations for this decision have been:

  • To avoid the large cost for many products of having to migrate to a new technology.

  • Continue with our open-source model without having to include a new license in the implementations, which would make us less competitive.

  • Greater control to improve integration with the rest of our modules.

What Onesait Platform Dataflow includes

As part of the maintenance of Onesait Platform Dataflow, several functionalities have been developed:

  • Scripts and descriptors needed to build Onesait Platoform Dataflow (OPD) images have been included in the repository.

  • Dataflow images have been published in the Onesait Platform Docker logs.

  • A new image has been created that allows deploying a repository with the component libraries for each OPD version. This allows to have all the libraries locally in the installations or in a centralized server and not having to download them from the Streamsets servers.

  • A new documentation site has been created so that there is no dependency on Streamsets' online documentation.

 

  • Internal access to Streamsets services, such as user logs, usage activities, etc., has been removed.

  • The user interface has been adapted.

New changes in the future

We are currently working on improvements to the management of Onesait Platform Dataflow instances in Kubernetes clusters, which is the reference deployment we use in Onesait Platform Dataflow. This enhancement includes distributed persistence of flows.

Other enhancements are being analyzed for the future:

  • Improved integration with Onesait Platform users.

  • Further adaptation of the user interface to Onesait standards.