Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-27530 FLIP-227: Support overdraft buffer
  3. FLINK-26762

Add the overdraft buffer in BufferPool to reduce unaligned checkpoint being blocked

    XMLWordPrintableJSON

Details

    • Hide
      New concept of overdraft network buffers was introduced to mitigate effects of uninterruptible blocking a subtask thread during back pressure. Starting from 1.16.0 Flink subtask can request by default up to 5 extra (overdraft) buffers over the regular configured amount (you can read more about this in the documentation: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/network_mem_tuning/#overdraft-buffers). This change can slightly increase memory consumption of the Flink Job. To restore the older behaviour you can set `taskmanager.network.memory.max-overdraft-buffers-per-gate` to zero.

      Show
      New concept of overdraft network buffers was introduced to mitigate effects of uninterruptible blocking a subtask thread during back pressure. Starting from 1.16.0 Flink subtask can request by default up to 5 extra (overdraft) buffers over the regular configured amount (you can read more about this in the documentation: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/network_mem_tuning/#overdraft-buffers) . This change can slightly increase memory consumption of the Flink Job. To restore the older behaviour you can set `taskmanager.network.memory.max-overdraft-buffers-per-gate` to zero.

    Description

      In some past JIRAs of Unaligned Checkpoint, the community has added the  recordWriter.isAvaliable() to reduce block for single record write. But for large record, flatmap or broadcast watermark, they may need more buffer.

      Can we add the overdraft buffer in BufferPool to reduce unaligned checkpoint being blocked? 

      Overdraft Buffer mechanism

      Add the configuration of 'taskmanager.network.memory.overdraft-buffers-per-gate=5'. 

      When requestMemory is called and the bufferPool is insufficient, the bufferPool will allow the Task to overdraw up to 5 MemorySegments. And bufferPool will be unavailable until all overdrawn buffers are consumed by downstream tasks. Then the task will wait for bufferPool being available.

      From the above, we have the following benefits:

      • For scenarios that require multiple buffers, the Task releases the Checkpoint lock, so the Unaligned Checkpoint can be completed quickly.
      • We can control the memory usage to prevent memory leak.
      • It just needs a litter memory, and can improve the stability of the Task under back pressure.
      • Users can increase the overdraft-buffers to adapt the scenarios that require more buffers.

       

      Masters, please correct me if I'm wrong, thanks a lot.

      Attachments

        1. image-2022-04-18-11-45-14-700.png
          101 kB
          Rui Fan
        2. image-2022-04-18-11-46-03-895.png
          235 kB
          Rui Fan

        Issue Links

          Activity

            People

              fanrui Rui Fan
              fanrui Rui Fan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: