Details
-
New Feature
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Users may want to limit the parallelism of a step. Two classic uses cases are:
- User wants to produce at most k files, so sets TextIO.Write.withNumShards(k).
- External API only supports k QPS, so user sets a limit of k/(expected QPS/step) on the ParDo that makes the API call.
Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later.
To implement this functionaltiy, I believe we need to add this support to the Beam Model.
Attachments
Issue Links
- is duplicated by
-
BEAM-159 Support fixed number of shards in sinks
- Resolved