Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-4582

Incorrectly translates apache_beam.runners.dataflow.native_io.streaming_create.DecodeAndEmitDoFn when creating the Dataflow pipeline json description

Details

    Description

      When executing against Dataflow, the JSON pipeline description contains the following JSON which doesn't appear in the pipeline proto:

       

          {
            "kind": "ParallelDo", 
            "name": "s2", 
            "properties": {
              "display_data": [
                {
                  "key": "fn", 
                  "label": "Transform Function", 
                  "namespace": "apache_beam.transforms.core.ParDo", 
                  "shortValue": "DecodeAndEmitDoFn", 
                  "type": "STRING", 
                  "value": "apache_beam.runners.dataflow.native_io.streaming_create.DecodeAndEmitDoFn"
                }
              ], 
              "non_parallel_inputs": {}, 
              "output_info": [
                {
                  "encoding": {
                    "@type": "kind:windowed_value", 
                    "component_encodings": [
                      {
                        "@type": "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/", 
                        "component_encodings": [
                          {
                            "@type": "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/", 
                            "component_encodings": []
                          }, 
                          {
                            "@type": "FastPrimitivesCoder$eNprYEpOLEhMzkiNT0pNzNVLzk9JLSqGUlxuicUlAUWZuZklmWWpxc4gQa5CBs3GQsbaQqZQ/vi0xJycpMTk7Hiw+kJmPEYFZCZn56RCjWABGsFaW8iWVJykBwDlGS3/", 
                            "component_encodings": []
                          }
                        ], 
                        "is_pair_like": true
                      }, 
                      {
                        "@type": "kind:global_window"
                      }
                    ], 
                    "is_wrapper": true
                  }, 
                  "output_name": "out", 
                  "user_name": "Some Numbers/Decode Values.out"
                }
              ], 
              "parallel_input": {
                "@type": "OutputReference", 
                "output_name": "out", 
                "step_name": "s1"
              }, 
              "serialized_fn": "ref_AppliedPTransform_AppliedPTransform_45", 
              "user_name": "Some Numbers/Decode Values"
            }
          }, 
      

      This causes the DataflowRunner to use a legacy code path and ask the Python SDK harness to execute a transform with a payload ref_AppliedPTransform_AppliedPTransform_45 instead of sending the PTransform proto.

       

      Attachments

        Issue Links

          Activity

            People

              ibzib Kyle Weaver
              lcwik Luke Cwik
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m