Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36772

FinalizeShuffleMerge fails with an exception due to attempt id not matching

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.2.0
    • Fix Version/s: 3.2.0
    • Component/s: Shuffle
    • Labels:
      None
    • Target Version/s:

      Description

      As part of driver request to external shuffle services (ESS) to finalize the merge, it also passes its application attempt id so that ESS can validate the request is from the correct attempt.
      This attempt id is fetched from the TransportConf passed in when creating the ExternalBlockStoreClient - and the transport conf leverages a cloned copy of the SparkConf passed to it.

      Application attempt id is set as part of SparkContext initialization.
      But this happens after driver SparkEnv has already been created.

      Hence the attempt id that ExternalBlockStoreClient uses will always end up being -1 : which will not match the attempt id at ESS (which is based on spark.app.attempt.id) : resulting in merge finalization to always fail (" java.lang.IllegalArgumentException: The attempt id -1 in this FinalizeShuffleMerge message does not match with the current attempt id 1 stored in shuffle service for application ...")

        Attachments

          Activity

            People

            • Assignee:
              zhouyejoe Ye Zhou
              Reporter:
              mridulm80 Mridul Muralidharan
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: