Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2538

External aggregation in Python

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.0.0, 1.0.1
    • 1.1.0
    • PySpark

    Description

      For huge reduce tasks, user will got out of memory exception when all the data can not fit in memory.

      It should put some of the data into disks and then merge them together, just like what we do in Scala.

      Attachments

        Issue Links

          Activity

            People

              davies Davies Liu
              davies Davies Liu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 72h
                  72h
                  Remaining:
                  Remaining Estimate - 72h
                  72h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified