Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-4582

Incorrectly translates apache_beam.runners.dataflow.native_io.streaming_create.DecodeAndEmitDoFn when creating the Dataflow pipeline json description

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: runner-dataflow
    • Labels:

      Description

      When executing against Dataflow, the JSON pipeline description contains the following JSON which doesn't appear in the pipeline proto:

       

          {
            "kind": "ParallelDo", 
            "name": "s2", 
            "properties": {
              "display_data": [
                {
                  "key": "fn", 
                  "label": "Transform Function", 
                  "namespace": "apache_beam.transforms.core.ParDo", 
                  "shortValue": "DecodeAndEmitDoFn", 
                  "type": "STRING", 
                  "value": "apache_beam.runners.dataflow.native_io.streaming_create.DecodeAndEmitDoFn"
                }
              ], 
              "non_parallel_inputs": {}, 
              "output_info": [
                {
                  "encoding": {
                    "@type": "kind:windowed_value", 
                    "component_encodings": [
                      {
                        "@type": "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/", 
                        "component_encodings": [
                          {
                            "@type": "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/", 
                            "component_encodings": []
                          }, 
                          {
                            "@type": "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/", 
                            "component_encodings": []
                          }
                        ], 
                        "is_pair_like": true
                      }, 
                      {
                        "@type": "kind:global_window"
                      }
                    ], 
                    "is_wrapper": true
                  }, 
                  "output_name": "out", 
                  "user_name": "Some Numbers/Decode Values.out"
                }
              ], 
              "parallel_input": {
                "@type": "OutputReference", 
                "output_name": "out", 
                "step_name": "s1"
              }, 
              "serialized_fn": "ref_AppliedPTransform_AppliedPTransform_45", 
              "user_name": "Some Numbers/Decode Values"
            }
          }, 
      

      This causes the DataflowRunner to use a legacy code path and ask the Python SDK harness to execute a transform with a payload ref_AppliedPTransform_AppliedPTransform_45 instead of sending the PTransform proto.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                lcwik Luke Cwik
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m