Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25340

Pushes down Sample beneath deterministic Project

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Minor
    • Resolution: Incomplete
    • Affects Version/s: 2.3.1
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:

      Description

      If computations in Project are heavy (e.g., UDFs), it is useful to push down sample nodes into deterministic projects;

      scala> spark.range(10).selectExpr("id + 3").sample(0.5).explain(true)
      // without this proposal
      == Analyzed Logical Plan ==
      (id + 3): bigint
      Sample 0.0, 0.5, false, 3370873312340343855
      +- Project [(id#0L + cast(3 as bigint)) AS (id + 3)#2L]
         +- Range (0, 10, step=1, splits=Some(4))
      
      == Optimized Logical Plan ==
      Sample 0.0, 0.5, false, 3370873312340343855
      +- Project [(id#0L + 3) AS (id + 3)#2L]
         +- Range (0, 10, step=1, splits=Some(4))
      
      // with this proposal
      == Optimized Logical Plan ==
      Project [(id#0L + 3) AS (id + 3)#2L]
      +- Sample 0.0, 0.5, false, -6519017078291024113
         +- Range (0, 10, step=1, splits=Some(4))
      

      POC: https://github.com/apache/spark/compare/master...maropu:SamplePushdown

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              maropu Takeshi Yamamuro
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: