Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-68

Support for limiting parallelism of a step

Details

    • New Feature
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • beam-model
    • None

    Description

      Users may want to limit the parallelism of a step. Two classic uses cases are:

      • User wants to produce at most k files, so sets TextIO.Write.withNumShards(k).
      • External API only supports k QPS, so user sets a limit of k/(expected QPS/step) on the ParDo that makes the API call.

      Unfortunately, there is no way to do this effectively within the Beam model. A GroupByKey with exactly k keys will guarantee that only k elements are produced, but runners are free to break fusion in ways that each element may be processed in parallel later.

      To implement this functionaltiy, I believe we need to add this support to the Beam Model.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dhalperi Dan Halperin
              Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: