Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1997

Scaling Problem of Beam (size of the serialized JSON representation of the pipeline exceeds the allowable limit)

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Invalid
    • 0.6.0
    • 0.6.0
    • runner-dataflow
    • None

    Description

      After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does no longer run with 180 output days (BigQuery partitions as sinks), but only 60 output days. If using a larger number with Beam the response from the Cloud Dataflow service reads as follows:

      Failed to create a workflow job: The size of the serialized JSON representation of the pipeline exceeds the allowable limit. For more information, please check the FAQ link below:
      

      This is the pipeline in dataflow: https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
      The resulting graph in Dataflow looks like this:
      https://puu.sh/vhWAW/a12f3246a1.png

      This is the same pipeline in beam: https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c
      The constructed graph looks somewhat different:
      https://puu.sh/vhWvm/78a40d422d.png

      Methods used are taken from this example https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6

      Attachments

        Activity

          People

            dhalperi Dan Halperin
            james-woods Tobias Kaymak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: