Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-24459

Performance improvement of file sink on Nexmark

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Context

      PartitionPathUtils.escapePathName is a pretty simple method that takes String, allocates StringBuilder, appends original or escaped chars, and outputs the result String.

      Filesystem sink calls the method several times for each element to determine bucket id. Because of this, it is a hot spot on a workload that writes intensively to filesystem, such as nexmark q10. On my local machine escaping of chars takes 9.53% CPU and 17.8% mem allocations of the whole TaskManager process.

      Proposal

      PartitionPathUtils.escapePathName improvements

      1. Use more efficient Integer.toHexString instead of String.format
      2. Do not allocate new string when there is no escapable char in the original string
      3. Allocate StringBuilder depending on the original string length instead of the default value

      Benefit

      Experiment on local machine.
      1 TaskManager with 6 slots. Job parallelism 6. Nexmark default configuration + object reuse option.
      Before: flink-1.14.0
      After: flink-1.14.0 + patch with the improvements

      Nexmark q10 Before After
      CPU samples of escapePathName() (% of all) 9.53 1.64
      Memory allocations by escapePathName() (% of all) 17.8 2.98
      Throughput/Cores (K/s) 107.64 119.42

      Diff: CPU -7.89%, Memory -14.82%, Throughput +10.9%

      Profiling reports are in the attachment.

      Attachments

        1. after_cpu.png
          21 kB
          Alexander Trushev
        2. after_mem.png
          25 kB
          Alexander Trushev
        3. before_cpu.png
          25 kB
          Alexander Trushev
        4. before_mem.png
          37 kB
          Alexander Trushev
        5. before.jfr.zip
          12.50 MB
          Alexander Trushev
        6. after.jfr.zip
          13.69 MB
          Alexander Trushev

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            trushev Alexander Trushev
            trushev Alexander Trushev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment