Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-17477

resumeConsumption call should happen as quickly as possible to minimise latency

    XMLWordPrintableJSON

    Details

      Description

      We should be calling InputGate#resumeConsumption() as soon as possible (to avoid any unnecessary delay/latency when task is idling). Currently I think it’s mostly fine - the important bit is that on the happy path, we always resumeConsumption before trying to complete the checkpoint, so that netty threads will start resuming the network traffic while the task thread is doing the synchronous part of the checkpoint and starting asynchronous part. But I think in two places we are first aborting checkpoint and only then resuming consumption (in CheckpointBarrierAligner):

      // let the task know we are not completing this
      notifyAbort(currentCheckpointId,
      	new CheckpointException(
      		"Barrier id: " + barrierId,
      		CheckpointFailureReason.CHECKPOINT_DECLINED_SUBSUMED));
      // abort the current checkpoint
      releaseBlocksAndResetBarriers();
      
      // let the task know we skip a checkpoint
      notifyAbort(currentCheckpointId,
      		new CheckpointException(CheckpointFailureReason.CHECKPOINT_DECLINED_INPUT_END_OF_STREAM));
      // no chance to complete this checkpoint
      releaseBlocksAndResetBarriers();
      

      It’s not a big deal, as those are a rare conditions, but it would be better to be consistent everywhere: first release blocks and resume consumption, before anything else happens.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                pnowojski Piotr Nowojski
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: