Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11271

MapStatus too large for driver

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Shuffle, Spark Core
    • None

    Description

      When I run a spark job contains quite a lot of tasks(in my case is 200k[maptasks]*200k[reducetasks]), the driver occured OOM mainly caused by the object MapStatus,
      RoaringBitmap that used to mark which block is empty seems to use too many memories.
      I try to use org.apache.spark.util.collection.BitSet instead of RoaringBitMap, and it can save about 20% memories.

      For the 200K tasks job,
      RoaringBitMap uses 3 Long[1024] and 1 Short[3392] =3*64*1024+16*3392=250880(bit)
      BitSet uses 1 Long[3125] = 3125*64=200000(bit)

      Memory saved = (250880-200000) / 250880 ≈20%

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Qin Yao Kent Yao 2
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: