Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-30189

HsSubpartitionFileReader may load data that has been consumed from memory

    XMLWordPrintableJSON

Details

    Description

      In order to solve the problem that data cannot be read from the disk correctly after failover, we changed the calculation logical of the buffer's readable state in FLINK-29238.  Buffers that are greater than consumingOffset and have been released can be pre-load from file. However, the update of consumingOffset is asynchronous, If it lags behind the actual consumption progress, the buffer will have a chance to be load from the disk again. 

      IMO, we can record the consumed status of buffer by each consumer in the InternalRegion. Only the buffers that have not been consumed and have been released will be considered as readable. In the case of failover, a new consumerId will be generated, so all buffers will be considered as unconsumed and can be correctly read from the disk too.

      Attachments

        Issue Links

          Activity

            People

              Weijie Guo Weijie Guo
              Weijie Guo Weijie Guo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: