Details
-
Improvement
-
Status: Resolved
-
P2
-
Resolution: Resolved
-
None
-
None
Description
The PipelineRunner API from the SDK is not ideal for the Beam technical vision.
It has technical limitations:
- The user's DAG (even including library expansions) is never explicitly represented, so it cannot be analyzed except incrementally, and cannot necessarily be reconstructed (for example, to display it!).
- The flattened DAG of just primitive transforms isn't well-suited for display or transform override.
- The TransformHierarchy isn't well-suited for optimizations.
- The user must realistically pre-commit to a runner, and its configuration (batch vs streaming) prior to graph construction, since the runner will be modifying the graph as it is built.
- It is fairly language- and SDK-specific.
It has usability issues (these are not from intuition, but derived from actual cases of failure to use according to the design)
- The interleaving of apply() methods in PTransform/Pipeline/PipelineRunner is confusing.
- The TransformHierarchy, accessible only via visitor traversals, is cumbersome.
- The staging of construction-time vs run-time is not always obvious.
These are just examples. This ticket tracks designing, coming to consensus, and building an API that more simply and directly supports the technical vision.