Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32939

Avoid re-compute expensive expression

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • 3.1.0
    • None
    • SQL
    • None

    Description

        test("Pushdown demo") {
          withTable("t") {
            withTempDir { loc =>
              sql(
                s"""CREATE TABLE t(c1 INT, s STRING) PARTITIONED BY(P1 STRING)
                   | LOCATION '${loc.getAbsolutePath}'
                   |""".stripMargin)
              sql(
                """
                  |SELECT c1,
                  |case
                  |  when get_json_object(s,'$.a')=1 then "a"
                  |  when get_json_object(s,'$.a')=2 then "b"
                  |end as s_type
                  |FROM t
                  |WHERE get_json_object(s,'$.a') in (1, 2)
                """.stripMargin).explain(true)
               }
          }
      }
      will got plan as 
      
      
      == Physical Plan ==
      *(1) Project [c1#1, CASE WHEN (cast(get_json_object(s#2, $.a) as int) = 1) THEN a WHEN (cast(get_json_object(s#2, $.a) as int) = 2) THEN b END AS s_type#0]
      +- *(1) Filter get_json_object(s#2, $.a) IN (1,2)
         +- Scan hive default.t [c1#1, s#2], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1, s#2], [P1#3], Statistics(sizeInBytes=8.0 EiB)
      
      we can see that  get_json_object(s#2, $.a) will be computed tree times
      Always there are expensive expressions are re-computed many times in such grammar。
      
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            angerszhuuu angerszhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: