Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13682

Pipeline fails to start when .withDynamicRead() is activated

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Won't Fix
    • 2.34.0, 2.35.0
    • Not applicable
    • io-java-gcp

    Description

      We have a Java, streaming, pipeline that consumes messages from Kafka and ingests them into BigQuery. We run this pipeline in Dataflow.

      Recently, we enabled the `.withDynamicRead()` option in our KafkaIO source to be able to auto-discover new partitions.

      As a result, the pipeline won't load in Dataflow with the following (not very informative) error:

      "Workflow failed. Causes: Internal Issue (fc4aaa0289f1f666): 62242584:26". 
      

      This error happens before the actual job graph can be generated.

      Google Support provided us with an internal Dataflow log message that can help to narrow the problem:

      "invalid_argument: Duplicate side input name.  Side input names must be unique."
      

      Some important pieces of information:

      • The error only happens if the destination sink is BigQueryIO. We have tested it by replacing BigQueryIO with TextIO and it works OK.
      • The error doesn't happen if pipeline is run by DirectRunner; only DataflowRunner.
      • We have reproduced this error in Beam SDK versions 2.34 and 2.35.

       

      I've written a small pipeline that reproduces the error. Please find it attached in the ticket Main.java.

      Attachments

        1. Main.java
          3 kB
          Adrià Arcarons

        Activity

          People

            Unassigned Unassigned
            adria.arcarons Adrià Arcarons
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: