Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40999

Hints on subqueries are not properly propagated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.4.0
    • 3.4.0
    • Optimizer, Spark Core
    • None

    Description

      Currently, if a user tries to specify a query like the following, the hints on the subquery will be lost. 

      SELECT * FROM target t WHERE EXISTS
      (SELECT /*+ BROADCAST */ * FROM source s WHERE s.key = t.key)

      This happens as hints are removed from the plan and pulled into joins in the beginning of the optimization stage, but subqueries are only turned into joins during optimization. As we remove any hints that are not below a join, we end up removing hints that are below a subquery. 

       

      It worked prior to a refactoring that added hints as a field to joins (SPARK-26065) and can cause a regression if someone made use of hints on subqueries before.

       

      To resolve this, we add a hint field to SubqueryExpression that any hints inside a subquery's plan can be pulled into during EliminateResolvedHint, and then pass this hint on when the subquery is turned into a join.

      Attachments

        Activity

          People

            fred-db Fredrik Klauß
            fred-db Fredrik Klauß
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: