Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3293

Add lazy map side input form

Details

    • Improvement
    • Status: Triage Needed
    • P3
    • Resolution: Fixed
    • None
    • 2.35.0
    • sdk-go
    • None

    Description

      Add InputKinds LazyMap and LazyMultiMap that allow map lookup without reading everything to memory. They will be accessed through functions such as:

      func(K) func(*V) bool   (a keyed function that returns an iterator)
      func(K) []V                         (a keyed function that returns a slice of values)

      On the execution layer, the new forms would need to be added to exec/sideinput.go
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/sideinput.go
      The inputs layer, for the actual abstraction using reflection:
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/input.go

      The funcx package would need to be updated to detect the new parameter forms
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/funcx/fn.go
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/funcx/sideinput.go

      as well has the DoFn graph validation code
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/graph/fn.go#L566

      They would need to be correctly translated into the pipeline protos:
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L315
      and finally back to the newly created handlers in the exec package.
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/translate.go#L402

      If implemented pre-generics, the code generator frontend, and backend would need to be updated to detect and generate code for efficient no-reflection overhead map access functions. https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/util/shimx/generate.go
      https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/util/starcgenx/starcgenx.go

      Unit must be added throughout and Integration tests should be added to verify the functionality against portable beam runners.
      https://github.com/apache/beam/tree/master/sdks/go/test/integration/primitives

      And of course, the user GoDoc should be updated for the support.

      See this lengthy email response for a more indepth guide to how Side Inputs operate. https://lists.apache.org/thread.html/ra42dc7ee30842f11740eff33f0afcd63702695878e427127e1268381%40%3Cdev.beam.apache.org%3E 

      Attachments

        Issue Links

          Activity

            People

              jrmccluskey Jack McCluskey
              herohde Henning Rohde
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5.5h
                  5.5h