Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32524 Improve the watermark aggregation performance when enabling the watermark alignment
  3. FLINK-32420

Watermark aggregation performance is poor when watermark alignment is enabled and parallelism is high

    XMLWordPrintableJSON

Details

    • Hide
      This performance improvement would be good to mention in the release blog post.

      As proven by the micro benchmarks (screenshots attached in the ticket), with 5000 subtasks, the time to calculate the watermark alignment on the JobManager by a factor of 76x (7664%). Previously such large jobs where actually at large risk of overloading JobManager, now that's far less likely to happen.
      Show
      This performance improvement would be good to mention in the release blog post. As proven by the micro benchmarks (screenshots attached in the ticket), with 5000 subtasks, the time to calculate the watermark alignment on the JobManager by a factor of 76x (7664%). Previously such large jobs where actually at large risk of overloading JobManager, now that's far less likely to happen.

    Description

      The SourceCoordinator.WatermarkAggregator#aggregate method will find the smallest watermark of all keys as the  aggregatedWatermark.

      However, the time complexity of the aggregate method in a WatermarkAlignment updateInterval cycle is O(n*n),because:

      • Every subtask report a latest watermark to SourceCoordinator in a WatermarkAlignment updateInterval cycle
      • SourceCoordinator updates the smallest watermark from all subtasks for each reporting

      In general, the key is subtaskIndex, so the number of key is parallelism. When the parallelism is high, the watermark aggregation performance  will be poor.

      Performance Test:

      The parallelism is 10000, each subtask reports 20 watermarks, and the aggregate method takes 18.921s. Almost every round takes 950 ms.

      • If the watermarkAlignment updateInterval is 1s, SourceCoordinator will be very busy.
      • If it's less than 1s, the Watermark aggregation will be delayed

      I have finished the POC for performance improvement, and reduced Watermark aggregation time per watermarkAlignment updateInterval cycle from 950 ms to 6 ms.

      Attachments

        1. Screenshot 2023-07-13 at 17.19.11.png
          356 kB
          Piotr Nowojski
        2. Screenshot 2023-07-13 at 17.19.24.png
          85 kB
          Piotr Nowojski

        Issue Links

          Activity

            People

              fanrui Rui Fan
              fanrui Rui Fan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: