Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24459

Performance improvement of file sink on Nexmark

    XMLWordPrintableJSON

Details

    Description

      Context

      PartitionPathUtils.escapePathName is a pretty simple method that takes String, allocates StringBuilder, appends original or escaped chars, and outputs the result String.

      Filesystem sink calls the method several times for each element to determine bucket id. Because of this, it is a hot spot on a workload that writes intensively to filesystem, such as nexmark q10. On my local machine escaping of chars takes 9.53% CPU and 17.8% mem allocations of the whole TaskManager process.

      Proposal

      PartitionPathUtils.escapePathName improvements

      1. Use more efficient Integer.toHexString instead of String.format
      2. Do not allocate new string when there is no escapable char in the original string
      3. Allocate StringBuilder depending on the original string length instead of the default value

      Benefit

      Experiment on local machine.
      1 TaskManager with 6 slots. Job parallelism 6. Nexmark default configuration + object reuse option.
      Before: flink-1.14.0
      After: flink-1.14.0 + patch with the improvements

      Nexmark q10 Before After
      CPU samples of escapePathName() (% of all) 9.53 1.64
      Memory allocations by escapePathName() (% of all) 17.8 2.98
      Throughput/Cores (K/s) 107.64 119.42

      Diff: CPU -7.89%, Memory -14.82%, Throughput +10.9%

      Profiling reports are in the attachment.

      Attachments

        1. after_cpu.png
          21 kB
          Alexander Trushev
        2. after_mem.png
          25 kB
          Alexander Trushev
        3. before_cpu.png
          25 kB
          Alexander Trushev
        4. before_mem.png
          37 kB
          Alexander Trushev
        5. before.jfr.zip
          12.50 MB
          Alexander Trushev
        6. after.jfr.zip
          13.69 MB
          Alexander Trushev

        Issue Links

          Activity

            People

              trushev Alexander Trushev
              trushev Alexander Trushev
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: