Details
-
New Feature
-
Status: Resolved
-
P3
-
Resolution: Fixed
-
None
-
None
Description
SplittableDoFn is a proposed enhancement for "dynamically splittable work" to the Beam model.
Among other things, it would allow a unified implementation of bounded/unbounded sources with dynamic work rebalancing and the ability to express multiple scalable steps (e.g., global expansion -> file sizing & parsing -> splitting files into independently-processable blocks) via composition rather than inheritance.
This would make it much easier to implement many types of sources, to modify and reuse existing sources. Also, it would improve scalability of the Beam model by moving things like splitting a source from the control plane (where it is today – glob -> List<FileBasedSource> sent over service APIs) into the data plane (PCollection<Glob> -> PCollection<FileName> -> ...).
Attachments
Issue Links
- incorporates
-
BEAM-1903 Splittable DoFn should report watermarks via ProcessContext
-
- Resolved
-
-
BEAM-1904 Remove DoFn.ProcessContinuation
-
- Resolved
-
-
BEAM-2301 Standard expansion of SDF should be in runners-core-construction
-
- Resolved
-
-
BEAM-1377 Support Splittable DoFn in Dataflow streaming runner
-
- Resolved
-
-
BEAM-1855 Support Splittable DoFn in Flink Streaming runner
-
- Resolved
-
- is depended upon by
-
BEAM-3301 Go SplittableDoFn support
-
- Resolved
-
- is related to
-
BEAM-644 Primitive to shift the watermark while assigning timestamps
-
- Open
-
- links to