Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24143

filter empty blocks when convert mapstatus to (blockId, size) pair

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3.0
    • 2.4.0
    • Spark Core
    • None

    Description

      In current code(MapOutputTracker.convertMapStatuses), mapstatus are converted to (blockId, size) pair for all blocks – no matter the block is empty or not, which result in OOM when there are lots of consecutive empty blocks, especially when adaptive execution is enabled.

      (blockId, size) pair is only used in ShuffleBlockFetcherIterator to control shuffle-read and only non-empty block request is sent. Can we just filter out the empty blocks in MapOutputTracker.convertMapStatuses and save memory?

      Attachments

        Activity

          People

            jinxing6042@126.com Jin Xing
            jinxing6042@126.com Jin Xing
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: