A Look at the DataFlow

This module allows you to visually create and configure data flows between sources and destinations for both ETL/ELT-type processes and streaming flows, including among these flows transformations and data quality processes.

 

Let's see a couple of examples:

  • Ingest to Hadoop of the tail of a file with a process of field elimination:

  • Ingest from a REST endpoint and load to the platform's Semantic DataHub with data quality process:

 

  • It also offers a large number of connectors for specific communications, both input and output, as well as processors (in the Platform Developer Portal, you can see all the connectors: http://bit.ly/2rwWZ1N ).

Among the main connectors of the Dataflow, you can find Big Data connectors with Hadoop, Spark, FTP, Files, Endpoint REST, JDBC, BD NoSQL, Kafka, Azure Cloud Services, AWS, Google, ...

  • The platform-integrated component is the open-source software, StreamSets DataFlow (https://streamsets.com) on which several connectors have been built to communicate with the platform:

  • All creation, development, deployment and monitoring of flows is performed from the platform's web console (ControlPanel):

  • List of DataFlows by user with Administrator role (who can see flows from other users):

  • DataFlow in development phase:

  • DataFlow running:

  • Debugging a DataFlow:

  • Fully integrated with the main Big Data technologies, including HDFS, HIVE, Spark, Kafka, SparkSQL, ... allowing to handle them easily and centrally:

Besides connectors in areas such as IoT (OPC, CoAP, MQTT), Social Networks, ...