Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-68

Support for limiting parallelism of a step

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: beam-model
    • Labels:
      None

      Description

      Users may want to limit the parallelism of a step. Two classic uses cases are:

      • User wants to produce at most k files, so sets TextIO.Write.withNumShards(k).
      • External API only supports k QPS, so user sets a limit of k/(expected QPS/step) on the ParDo that makes the API call.

      Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later.

      To implement this functionaltiy, I believe we need to add this support to the Beam Model.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                dhalperi@google.com Daniel Halperin
              • Votes:
                2 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated: