Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-681

DoFns should be serialized at apply time and deserialized when executing

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sdk-py-core
    • Labels:

      Description

      1. Serializing DoFns at application time ensures that any modifications of fields within the DoFn after application do not accidentally pollute the execution. This mirrors the approach taken in Java to provide an approximation of lexical-closure (eg., you only need to know the state of the DoFn at the time it was applied, not afterwards, to understand its behavior).

      2. Based on 1, the DIrectRunner should also be deserializing DoFns before running them, which should also detect other classes of errors such as using the pipeline object (which is not pickleable) within the DoFn

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              bchambers Ben Chambers
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: