Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.12.0
-
None
Description
Current CheckpointUnaligner interacts with RemoteInputChannel to persisting the input buffers. However, based the current implementation it seems if we have the following case:
1. There are 3 input channels. 2. Input channel 0 received barrier 1, and processed barrier 1 to start checkpoint 1. 3. Input channel 1 received barrier 1, and processed barrier 1. Now the state of input channel persister becomes BARRIER_RECEIVED and numBuffersOvertaken(channel 1) = n_1. 4. However, input 2 received nothing and the checkpoint expired, new checkpoint is trigger. 5. Input channel 0 received barrier 2, checkpoint 1 is deserted and checkpoint 2 is started. However, in this case the state of the input channels are not cleared. Thus now channel 1 is still BARRIER_RECEIVED and numBuffersOvertaken(channel 1) = n_1. Then channel 1 would only persist n_1 buffers in the channel for the new checkpoint 2.