Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-65

SplittableDoFn

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.0
    • Component/s: beam-model
    • Labels:
      None

      Description

      SplittableDoFn is a proposed enhancement for "dynamically splittable work" to the Beam model.

      Among other things, it would allow a unified implementation of bounded/unbounded sources with dynamic work rebalancing and the ability to express multiple scalable steps (e.g., global expansion -> file sizing & parsing -> splitting files into independently-processable blocks) via composition rather than inheritance.

      This would make it much easier to implement many types of sources, to modify and reuse existing sources. Also, it would improve scalability of the Beam model by moving things like splitting a source from the control plane (where it is today – glob -> List<FileBasedSource> sent over service APIs) into the data plane (PCollection<Glob> -> PCollection<FileName> -> ...).

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jkff Eugene Kirpichov
                Reporter:
                dhalperi Dan Halperin
              • Votes:
                3 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: