Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11154

Missing coder in pipeline components with dataflow runner v2

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P2
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.26.0
    • Component/s: runner-dataflow
    • Labels:
      None

      Description

      When running pipelines with Top combine function on dataflow runner v2, the backend complains about missing coder id for example missing BoundedHeapCoder1.

      After some troubleshooting this problem seems more generic:

      The step context translation phase would not recognize already registered Coder with incorrect hashCode() function, and will try to give it a new uniqified name to the pipeline_proto_coder_id,

      code pointers:
      https://github.com/apache/beam/blob/5675108933de6eb601ca2e4f21870d2ababe0ec7/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L268

      In this case, since the comparator field in BoundedHeapCoder often does not implement hashCode() and equals() the BoundedHeapCoder will also have a different hashCode() each time a new instance is created. The duplicated coder does not exist in already translated pipeline proto and will lead to the aforementioned missing coder id issue.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                yichi Yichi Zhang
                Reporter:
                yichi Yichi Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m