Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42753

ReusedExchange refers to non-existent node

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • Spark Core, Web UI
    • None

    Description

      There is an AQE “issue“ where during AQE planning, the Exchange "that's being" reused could be replaced in the plan tree. So, when we print the query plan, the ReusedExchange will refer to an “unknown“ Exchange. An example below:

       

      (2775) ReusedExchange [Reuses operator id: unknown]
       Output [3]: [sr_customer_sk#271, sr_store_sk#275, sum#377L]

       

       

      Below is an example to demonstrate the root cause:

       

      AdaptiveSparkPlan
        |-- SomeNode X (subquery xxx)
            |-- Exchange A
                |-- SomeNode Y
                    |-- Exchange B
      Subquery:Hosting operator = SomeNode Hosting Expression = xxx dynamicpruning#388
      AdaptiveSparkPlan
        |-- SomeNode M
            |-- Exchange C
                |-- SomeNode N
                    |-- Exchange D
      

       

       

      Step 1: Exchange B is materialized and the QueryStage is added to stage cache

      Step 2: Exchange D reuses Exchange B

      Step 3: Exchange C is materialized and the QueryStage is added to stage cache

      Step 4: Exchange A reuses Exchange C

       

      Then the final plan looks like:

       

      AdaptiveSparkPlan
        |-- SomeNode X (subquery xxx)
            |-- Exchange A -> ReusedExchange (reuses Exchange C)
      
      Subquery:Hosting operator = SomeNode Hosting Expression = xxx dynamicpruning#388
      AdaptiveSparkPlan
        |-- SomeNode M
            |-- Exchange C -> PhotonShuffleMapStage ....
                |-- SomeNode N
                    |-- Exchange D -> ReusedExchange (reuses Exchange B)
      

       

       

      As a result, the ReusedExchange (reuses Exchange B) will refer to a non-exist node. This DOES NOT affect query execution but will cause the query visualization malfunction in the following ways:

      1. The ReusedExchange child subtree will still appear in the Spark UI graph but will contain no node IDs.
      2. The ReusedExchange node details in the Explain plan will refer to a UNKNOWN node. Example below.
      (2775) ReusedExchange [Reuses operator id: unknown]
      1. The child exchange and its subtree may be missing from the Explain text completely. No node details or tree string shown.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            steven.chen Steven Chen

            Dates

              Created:
              Updated:

              Slack

                Issue deployment