Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36809

Remove broadcast for InSubqueryExec used in DPP

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.0
    • 3.3.0
    • SQL
    • None

    Description

      Currently we include a broadcast variable in InSubqueryExec. We use it to hold filtering side query result of DPP. It looks weird because we don't use the result in executors but only need the result in the driver during query planning. We already hold the original result, so basically we hold two copied of query result at this moment.

      Another thing related is, in pruningHasBenefit we estimate if DPP pruning has benefit when the join type does not support broadcast. Due to the broadcast variable above, we also check the filtering side against the config autoBroadcastJoinThreshold. The config is not for the purpose and it is not a broadcast join. As the broadcast variable is unnecessary, we can remove this check and leave benefit estimation to overhead and pruning size.

      Attachments

        Activity

          People

            apachespark Apache Spark
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: