Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-11099

Go SDK Custom - Pre-Processing of SideInput data.

Details

    • Wish
    • Status: Resolved
    • P4
    • Resolution: Duplicate
    • None
    • Missing
    • sdk-go
    • None

    Description

      An idea borrowed from python: Allow users to specify a way to pre-process side input data on first use, and leverage the caching. This can simplify user DoFns by allowing them to convert their side input data (mostly lists) into a more useful form for their access pattern.

      It is strongly recommended to add Map Side Inputs https://issues.apache.org/jira/browse/BEAM-3293 before implementing this suggestion, and required to have caching implemented https://issues.apache.org/jira/browse/BEAM-11097. Otherwise very little benefit is acheived.

      See https://issues.apache.org/jira/browse/BEAM-3293 for where code might need to be changed.

      In particular, it would require a mechanism for the SDK to determine that a given unknown type is actually representing a side input, and a method by which to pre-process the data associated with it.
      Positional handling would expect to be maintained to identify the type of side inputs for pipeline type checking.
      Some "magic Method" similar to how the structural DoFn methods is likely the right approach, however, it's an open question on how to make this scale properly to more than a single side input. Otherwise, perhaps something that takes in a valid side input form, and returns a single value to be used instead?

      Attachments

        Activity

          People

            Unassigned Unassigned
            lostluck Robert Burke
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: