Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8577

FileSystems may have not be initialized during ResourceId deserialization

Details

    • Bug
    • Status: Triage Needed
    • P2
    • Resolution: Fixed
    • 2.16.0
    • 2.19.0
    • runner-flink
    • None

    Description

      • FileSystems use static registration using FileSystems#setDefaultPipelineOptions method.
      • #setDefaultPipelineOptions is called either when deserializaing SerializablePipelineOptions or during opening of various beam operators.
      • FileIO#matchAll is expanded using Reshuffle.viaRandomKey().
      • Reshuffle is implemented using .rebalance, that doesn't have a "RichFunction" lifecycle, so we need to find another way to register FileSystems, as the deserialization may happen before other "rich operators" get executed on particular task manager.

      This results in random pipeline fails as the task assignment is not deterministic.

      We can workaround this, by registering FileSystems during coder deserialization.

      Attachments

        Issue Links

          Activity

            People

              dmvk David Morávek
              dmvk David Morávek
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h