Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-20654

Unaligned checkpoint recovery may lead to corrupted data stream

    XMLWordPrintableJSON

Details

    Description

      Fix of FLINK-20433 shows potential corruption after recovery for all variations of UnalignedCheckpointITCase.

      To reproduce, run UCITCase a couple hundreds times. The issue showed for me in:

      • execute [Parallel union, p = 5]
      • execute [Parallel union, p = 10]
      • execute [Parallel cogroup, p = 5]
      • execute [parallel pipeline with remote channels, p = 5]
        with decreasing frequency.

      The issue manifests as one of the following issues:

      • stream corrupted exception
      • EOF exception
      • assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
      • (for union) ArithmeticException overflow (because the number that should be [0;100000] has been mis-deserialized)

      Attachments

        Issue Links

          Activity

            People

              pnowojski Piotr Nowojski
              arvid Arvid Heise
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: