Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14153

Reshuffled Row Coder PCollection used direct to Side Input breaks Dataflow & PyPortable

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.37.0, 2.38.0
    • 2.39.0
    • sdk-go
    • None

    Description

      Since First class Iterable side inputs were implemented, passing a reshuffled PCollection directly to a Side Input will cause a coder mismatch between encoding the reshuffle and decoding it on Dataflow and on Python Portable. In particular, the Row values will be encoded without a Length Prefix, but then be requested to decode them with a length prefix, which wasn't included.

      This is similar to the issue in BEAM-12438 which has been hacked around.

      In this instance it's likely more resilient to always length prefix Row encoded types, and make it explicit in the pipeline proto. This should avoid issues with runners having odd behaviors WRT row coders at this time, while not preventing them from introspecting row encoded values should they chose. This may also allow us to avoid the hack for BEAM-12438, though that is something to be verified independently.

      Attachments

        Issue Links

          Activity

            People

              lostluck Robert Burke
              lostluck Robert Burke
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 40m
                  3h 40m