I'm using side output feature to filter out malformatted events (errors) from a stream of valid events. Then I save valid events into one BigQuery table and errors go into another dedicated table.
Here is the code for outputting error rows:
The problem is that when running on DirectRunner in a batch mode (reading input from a file) and invalidEventRows PCollection ends up being empty (all events are valid – no errors), I get the following error:
There are no errors when executing the same code and invalidEventRows PCollection is not empty, the BigQuery table is created and the data are correctly inserted.
Also everything seems to be working fine in a streaming mode (reading from Pub/Sub) on both DirectRunner and DataflowRunner.
Looks like a bug?
Or should I open an issue in GoogleCloudPlatform/DataflowJavaSDK github project?