Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49905

Use different ShuffleOrigin for the shuffle required from stateful operators

    XMLWordPrintableJSON

Details

    Description

      ShuffleOrigin was introduced to determine whether the shuffle can be adjusted or not, and if it can be, in which way.
       
      ENSURE_REQUIREMENTS used by the rule `EnsureRequirements` denotes that the shuffle can be adjusted as long as the requirements are still ensured. Here the requirements do not include the number of partitions so AQE can possibly optimize the number (assuming that the number of partitions isn't a strong requirement), but for stateful operators, the number of partitions is a strong requirement and it must not be adjusted.
       
      To prevent any further modification of AQE to break stateful operator (as same as the reason we introduced StatefulOpClusteredDistribution), we want to explicitly mark the shuffle backing up stateful operator as different ShuffleOrigin.

      Attachments

        Issue Links

          Activity

            People

              kabhwan Jungtaek Lim
              kabhwan Jungtaek Lim
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: