Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
Description
beam.Create() should be splittable. This would allow the unintuitive "Reshuffle" step below to be safely omitted:
pipeline = (
beam.Create(range(large_number))
| beam.Reshuffle() # prevent task fusion
| beam.Map(very_expensive_function)
...
)
These sort of pipelines with small inputs to expensive CPU bound tasks arise frequently in scientific computing use-cases.