Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-16052

OOM in Kafka test suite

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.7.0
    • 3.8.0
    • None
    • None

    Description

      Problem
      Our test suite is failing with frequent OOM. Discussion in the mailing list is here: https://lists.apache.org/thread/d5js0xpsrsvhgjb10mbzo9cwsy8087x4 

      Setup
      To find the source of leaks, I ran the :core:test build target with a single thread (see below on how to do it) and attached a profiler to it. This Jira tracks the list of action items identified from the analysis.

      How to run tests using a single thread:

      diff --git a/build.gradle b/build.gradle
      index f7abbf4f0b..81df03f1ee 100644
      --- a/build.gradle
      +++ b/build.gradle
      @@ -74,9 +74,8 @@ ext {
             "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED"
           )-  maxTestForks = project.hasProperty('maxParallelForks') ? maxParallelForks.toInteger() : Runtime.runtime.availableProcessors()
      -  maxScalacThreads = project.hasProperty('maxScalacThreads') ? maxScalacThreads.toInteger() :
      -      Math.min(Runtime.runtime.availableProcessors(), 8)
      +  maxTestForks = 1
      +  maxScalacThreads = 1
         userIgnoreFailures = project.hasProperty('ignoreFailures') ? ignoreFailures : false   userMaxTestRetries = project.hasProperty('maxTestRetries') ? maxTestRetries.toInteger() : 0
      diff --git a/gradle.properties b/gradle.properties
      index 4880248cac..ee4b6e3bc1 100644
      --- a/gradle.properties
      +++ b/gradle.properties
      @@ -30,4 +30,4 @@ scalaVersion=2.13.12
       swaggerVersion=2.2.8
       task=build
       org.gradle.jvmargs=-Xmx2g -Xss4m -XX:+UseParallelGC
      -org.gradle.parallel=true
      +org.gradle.parallel=false 

      Result of experiment
      This is how the heap memory utilized looks like, starting from tens of MB to ending with 1.5GB (with spikes of 2GB) of heap being used as the test executes. Note that the total number of threads also increases but it does not correlate with sharp increase in heap memory usage. The heap dump is available at https://www.dropbox.com/scl/fi/nwtgc6ir6830xlfy9z9cu/GradleWorkerMain_10311_27_12_2023_13_37_08.hprof.zip?rlkey=ozbdgh5vih4rcynnxbatzk7ln&dl=0 

      Attachments

        1. newRM.patch
          3 kB
          Luke Chen
        2. Screenshot 2023-12-27 at 14.04.52.png
          360 kB
          Divij Vaidya
        3. Screenshot 2023-12-27 at 14.22.21.png
          551 kB
          Divij Vaidya
        4. Screenshot 2023-12-27 at 14.45.20.png
          169 kB
          Divij Vaidya
        5. Screenshot 2023-12-27 at 15.31.09.png
          397 kB
          Divij Vaidya
        6. Screenshot 2023-12-27 at 17.44.09.png
          320 kB
          Divij Vaidya
        7. Screenshot 2023-12-28 at 00.13.06.png
          59 kB
          Divij Vaidya
        8. Screenshot 2023-12-28 at 00.18.56.png
          145 kB
          Divij Vaidya
        9. Screenshot 2023-12-28 at 11.26.03.png
          57 kB
          Divij Vaidya
        10. Screenshot 2023-12-28 at 11.26.09.png
          55 kB
          Divij Vaidya
        11. Screenshot 2023-12-28 at 18.44.19.png
          135 kB
          Divij Vaidya
        12. Screenshot 2024-01-10 at 14.59.47.png
          459 kB
          Divij Vaidya

        Issue Links

          There are no Sub-Tasks for this issue.

          Activity

            People

              divijvaidya Divij Vaidya
              divijvaidya Divij Vaidya
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: