Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13628

[Go SDK] Make Side input cache fit resolved semantics.

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.35.0
    • 2.36.0
    • sdk-go
    • None

    Description

      It's been determined the documentation in the proto was a a bit buggy WRT Side input semantics. Previous to https://github.com/apache/beam/pull/16474 it said state cache tokens are globally unique, however, in implementation and the original design they are unique WRT their associated StateKeys.

      This means the Go SDK's side input cache is broken as delivered, and can cause a correctness issue when there are multiple distinct side inputs, of the same type. The mitigation is to not use the SideInput cache in affected versions (2.35.0). The cache is off by default.

      The correction will use the whole state key (which, for side inputs includes the transformID ,SideInputID) tuple (with a user key if it's a multimap side input)), along with the Runner provided token.

      Since this can at worst cause a data correctness issue rather than a pipeline failure, this should be part of the 2.36.0 release. We may wish to backport it to a 2.35.1 patch release, only for the Go SDK to close the gap as well.

      Attachments

        Activity

          People

            jrmccluskey Jack McCluskey
            lostluck Robert Burke
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 50m
                50m