Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3056

kudu-spark HdrHistogramAccumulator is too big, and make spark job failed

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.9.0
    • Fix Version/s: 1.12.0
    • Component/s: spark
    • Labels:
      None
    • Target Version/s:

      Description

      in production envrinment, we use kudu-spark to read kudu table, but even we don't use the
      HdrHistogramAccumulator, the HdrHistogramAccumulator stored in an array is stiil so big,
      totoal of them are almost 2 MB, so that when the number of kudu-spark task(for read kudu data and shuffle) is more than 900, the spark job failed, and the follwing error occured,

      Job aborted due to stage failure: Total size of serialized results of 1413 tasks (3.0 GB) is bigger than spark.driver.maxResultSize (3.0 GB)

        Attachments

        1. heap1.png
          66 kB
          caiconghui
        2. heap2.png
          81 kB
          caiconghui
        3. heap3.png
          198 kB
          caiconghui

          Issue Links

            Activity

              People

              • Assignee:
                granthenke Grant Henke
                Reporter:
                cch13 caiconghui
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 12h
                  12h
                  Remaining:
                  Remaining Estimate - 12h
                  12h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified