Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Won't Fix
-
2.34.0, 2.35.0
Description
We have a Java, streaming, pipeline that consumes messages from Kafka and ingests them into BigQuery. We run this pipeline in Dataflow.
Recently, we enabled the `.withDynamicRead()` option in our KafkaIO source to be able to auto-discover new partitions.
As a result, the pipeline won't load in Dataflow with the following (not very informative) error:
"Workflow failed. Causes: Internal Issue (fc4aaa0289f1f666): 62242584:26".
This error happens before the actual job graph can be generated.
Google Support provided us with an internal Dataflow log message that can help to narrow the problem:
"invalid_argument: Duplicate side input name. Side input names must be unique."
Some important pieces of information:
- The error only happens if the destination sink is BigQueryIO. We have tested it by replacing BigQueryIO with TextIO and it works OK.
- The error doesn't happen if pipeline is run by DirectRunner; only DataflowRunner.
- We have reproduced this error in Beam SDK versions 2.34 and 2.35.
I've written a small pipeline that reproduces the error. Please find it attached in the ticket Main.java.