Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15696

KMS performance regression due to too many open file descriptors after Jetty migration

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0-alpha2
    • Fix Version/s: 3.2.0, 3.0.4, 3.1.2
    • Component/s: kms
    • Labels:
      None

      Description

      We recently found KMS performance regressed in Hadoop 3.0, possibly linking to the migration from Tomcat to Jetty inĀ HADOOP-13597.

      Symptoms:

      1. Hadoop 3.x KMS open file descriptors quickly rises to more than 10 thousand under stress, sometimes even exceeds 32K, which is the system limit, causing failures for any access to encryption zones. Our internal testing shows the openfd number was in the range of a few hundred in Hadoop 2.x, and it increases by almost 100x in Hadoop 3.
      2. Hadoop 3.x KMS as much as twice the heap size than in Hadoop 2.x. The same heap size can go OOM in Hadoop 3.x. Jxray analysis suggests most of them are temporary byte arrays associated with open SSL connections.
      3. Due to the heap usage, Hadoop 3.x KMS has more frequent GC activities, and we observed up to 20% performance reduction due to GC.

      A possible solution is to reduce the idle timeout setting in HttpServer2. It is currently hard-coded 10 seconds. By setting it to 1 second, open fds dropped from 20 thousand down to 3 thousand in my experiment.

      File this jira to invite open discussion for a solution.

      Credit: Misha Dmitriev for the proposed Jetty idle timeout remedy; Xiao Chen for digging into this problem.

      Screenshots:

      CDH5 (Hadoop 2) KMS CPU utilization, resident memory and file descriptor chart.

      CDH6 (Hadoop 3) KMS CPU utilization, resident memory and file descriptor chart.

      CDH5 (Hadoop 2) GC activities on the KMS process

      CDH6 (Hadoop 3) GC activities on the KMS process

      JXray report

      open fd drops from 20 k down to 3k after the proposed change.

        Attachments

        1. HADOOP-15696.001.patch
          5 kB
          Wei-Chiu Chuang
        2. HADOOP-15696.002.patch
          6 kB
          Wei-Chiu Chuang
        3. HADOOP-15696.003.patch
          6 kB
          Wei-Chiu Chuang
        4. HADOOP-15696.branch-3.1.001.patch
          6 kB
          Wei-Chiu Chuang
        5. Screen Shot 2018-08-22 at 11.36.16 AM.png
          270 kB
          Wei-Chiu Chuang
        6. Screen Shot 2018-08-22 at 4.26.51 PM.png
          283 kB
          Wei-Chiu Chuang
        7. Screen Shot 2018-08-22 at 4.26.51 PM.png
          283 kB
          Wei-Chiu Chuang
        8. Screen Shot 2018-08-22 at 4.27.02 PM.png
          352 kB
          Wei-Chiu Chuang
        9. Screen Shot 2018-08-22 at 4.30.32 PM.png
          162 kB
          Wei-Chiu Chuang
        10. Screen Shot 2018-08-22 at 4.30.39 PM.png
          166 kB
          Wei-Chiu Chuang
        11. Screen Shot 2018-08-24 at 7.08.16 PM.png
          159 kB
          Wei-Chiu Chuang

          Issue Links

            Activity

              People

              • Assignee:
                jojochuang Wei-Chiu Chuang
                Reporter:
                jojochuang Wei-Chiu Chuang
              • Votes:
                0 Vote for this issue
                Watchers:
                10 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: