Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32091

Ignore timeout error when remove blocks on the lost executor

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.0, 3.0.0
    • Fix Version/s: 3.1.0
    • Component/s: Spark Core
    • Labels:
      None

      Description

      When removing blocks(e.g. RDD, broadcast, shuffle), BlockManagerMaserEndpoint will make RPC calls to each known BlockManagerSlaveEndpoint to remove the specific blocks. The PRC call sometimes could end in a timeout when the executor has been lost, but only notified the BlockManagerMasterEndpoint after the removing call has already happened. The timeout could therefore fail the whole query.

      In this case, we actually could just ignore the error since those blocks on the lost executor could be considered as removed already.

        Attachments

          Activity

            People

            • Assignee:
              Ngone51 wuyi
              Reporter:
              Ngone51 wuyi
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: