Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-24143

filter empty blocks when convert mapstatus to (blockId, size) pair

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0
    • Fix Version/s: 2.4.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      In current code(MapOutputTracker.convertMapStatuses), mapstatus are converted to (blockId, size) pair for all blocks – no matter the block is empty or not, which result in OOM when there are lots of consecutive empty blocks, especially when adaptive execution is enabled.

      (blockId, size) pair is only used in ShuffleBlockFetcherIterator to control shuffle-read and only non-empty block request is sent. Can we just filter out the empty blocks in MapOutputTracker.convertMapStatuses and save memory?

        Attachments

          Activity

            People

            • Assignee:
              jinxing6042@126.com Jin Xing
              Reporter:
              jinxing6042@126.com Jin Xing
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: