Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-4591

beam.Create should be splittable

Details

    Description

      beam.Create() should be splittable. This would allow the unintuitive "Reshuffle" step below to be safely omitted:
       
      pipeline = (
          beam.Create(range(large_number))
          | beam.Reshuffle()  # prevent task fusion
          | beam.Map(very_expensive_function)
          ...
      )
       
      These sort of pipelines with small inputs to expensive CPU bound tasks arise frequently in scientific computing use-cases.

      Attachments

        Activity

          People

            Unassigned Unassigned
            shoyer Stephan Hoyer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: