Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Invalid
-
0.6.0
-
None
Description
After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does no longer run with 180 output days (BigQuery partitions as sinks), but only 60 output days. If using a larger number with Beam the response from the Cloud Dataflow service reads as follows:
Failed to create a workflow job: The size of the serialized JSON representation of the pipeline exceeds the allowable limit. For more information, please check the FAQ link below:
This is the pipeline in dataflow: https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
The resulting graph in Dataflow looks like this:
https://puu.sh/vhWAW/a12f3246a1.png
This is the same pipeline in beam: https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c
The constructed graph looks somewhat different:
https://puu.sh/vhWvm/78a40d422d.png
Methods used are taken from this example https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6