Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.16.0, 1.17.0
Description
In order to solve the problem that data cannot be read from the disk correctly after failover, we changed the calculation logical of the buffer's readable state in FLINK-29238. Buffers that are greater than consumingOffset and have been released can be pre-load from file. However, the update of consumingOffset is asynchronous, If it lags behind the actual consumption progress, the buffer will have a chance to be load from the disk again.
IMO, we can record the consumed status of buffer by each consumer in the InternalRegion. Only the buffers that have not been consumed and have been released will be considered as readable. In the case of failover, a new consumerId will be generated, so all buffers will be considered as unconsumed and can be correctly read from the disk too.
Attachments
Issue Links
- causes
-
FLINK-29419 HybridShuffleITCase.testHybridFullExchangesRestart hangs
- Closed
- links to