/
How to test components for the Dataflow?

How to test components for the Dataflow?

Introduction

To enhance the capabilities of Dataflow it is possible to develop custom stages that can be used in the pipelines you create with Dataflow itself in the same way as any of Dataflow's own components.

This article describes how you can do automatic tests with JUnit to test the custom stages you develop.

To show examples, we will use a project whose full code can be found on github. This project includes test examples for the JDBC components of Dataflow itself. In this way it shows examples for both source type, target type and processor type stages.

Project Configuration

The example project on github, uses Maven for dependency management. The dependencies used are described below.

Versions used:

<properties> <maven.compiler.target>1.8</maven.compiler.target> <maven.compiler.source>1.8</maven.compiler.source> <sdc.version>3.23.0</sdc.version> <junit.version>5.9.3</junit.version> <h2.version>2.2.220</h2.version> </properties>

JUnit library, in this case, using JUnit 5.

<dependency> <groupId>org.junit.jupiter</groupId> <artifactId>junit-jupiter-api</artifactId> <version>${junit.version}</version> <scope>test</scope> </dependency>

In this case, since we are testing the Dataflow JDBC library, we need to declare the library that contains the components we are going to test. In a component development project, this declaration would not be necessary as the tests would be part of the project itself.

<dependency> <groupId>com.streamsets</groupId> <artifactId>streamsets-datacollector-stagesupport</artifactId> <version>${sdc.version}</version> <scope>test</scope> </dependency>

In many cases, we will need additional libraries to test external connections. The type of test—whether unit tests with mocks or integration tests with external systems—will determine this need. In this example, we will use a database to test the connections directly. For this purpose, we use the H2 in-memory database.

Finally, it is necessary to declare the specific Dataflow dependencies that will be used to execute the stages under test and to verify the results.

Source Test

In this example, to test the JDBC source, we start an H2 database and populate it with some test tables.

Similarly, at the end of the test, we clean up the database.

To conduct the test, the first thing we do is create a JDBC source component.

The initialization values depend on the type of component, so those used in this example are specific to the JDBC source.

Next, we create a SourceRunner object, which will allow us to execute the "reads" from the source.

After this, we simulate a sequence of reads (and writes, if needed) from the database using the source. Additionally, we ensure to release the runner's resources once the test execution is complete.

There's no need to detail the test as it is specific to the component we are testing in the example. The important thing is to see how the SourceRunner object (runner in the example) is used to produce records. It is also useful to note how the values of the obtained data can be checked to ensure they are as expected.

Complete example on github.

Destination Test

To test writing to the destination, a strategy similar to the one used for testing reading was employed. Therefore, we won't repeat the similar code for setting up the database. For the complete example, please refer to github. The main difference is that a destination object, JdbcTarget, will be created in our example:

A TargetRunner is created to execute the write operation. To write data, you need to create the records that will form the write batch. Finally, the TargetRunner is used to write these records to the destination.

Finally, you need to verify that the data has been written to the destination. In this example, we check if the data has been written to the database.

Processor Test

This case is very similar to the Sources and Destinations tests. The main difference is that a processor component and a ProcessorRunner are used.

The remaining checks are similar to those for sources and destinations. You can view the complete example on github.