Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1823

ExternalAppendOnlyMap can still OOM if one key is very large

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 1.0.2, 1.1.0
    • None
    • Spark Core

    Description

      If the values for one key do not collectively fit into memory, then the map will still OOM when you merge the spilled contents back in.

      This is a problem especially for PySpark, since we hash the keys (Python objects) before a shuffle, and there are only so many integers out there in the world, so there could potentially be many collisions.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              andrewor14 Andrew Or
              Votes:
              9 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: