Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3056

kudu-spark HdrHistogramAccumulator is too big, and make spark job failed

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.9.0
    • 1.12.0
    • spark
    • None

    Description

      in production envrinment, we use kudu-spark to read kudu table, but even we don't use the
      HdrHistogramAccumulator, the HdrHistogramAccumulator stored in an array is stiil so big,
      totoal of them are almost 2 MB, so that when the number of kudu-spark task(for read kudu data and shuffle) is more than 900, the spark job failed, and the follwing error occured,

      Job aborted due to stage failure: Total size of serialized results of 1413 tasks (3.0 GB) is bigger than spark.driver.maxResultSize (3.0 GB)

      Attachments

        1. heap1.png
          66 kB
          caiconghui
        2. heap2.png
          81 kB
          caiconghui
        3. heap3.png
          198 kB
          caiconghui

        Issue Links

          Activity

            People

              granthenke Grant Henke
              cch13 caiconghui
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 12h
                  12h
                  Remaining:
                  Remaining Estimate - 12h
                  12h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified