Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42525

Collapse two adjacent windows with semantically-same partition/order

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.3
    • 3.5.0
    • Spark Core
    • None

    Description

      Extend the CollapseWindow rule to collapse Window nodes with semantically-same partition/order but the qualifiers are different.

       

      select a, b, c, row_number() over (partition by a order by b) as d from
      ( select a, b, rank() over (partition by a order by b) as c from t1) t2
      == Optimized Logical Plan ==
      before
      Window [row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS d#26], [a#11], [b#12 ASC NULLS FIRST]
      +- Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS c#25], [a#11], [b#12 ASC NULLS FIRST]
         +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 replicas)
               +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
                  +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2 AS _2#7]
                     +- *(1) MapElements org.apache.spark.sql.DataFrameSuite$$Lambda$1517/1628848368@3a479fda, obj#5: scala.Tuple2
                        +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: java.lang.Long
                           +- *(1) Range (0, 10, step=1, splits=2)
      
      after
      Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS c#25, row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS d#26], [a#11], [b#12 ASC NULLS FIRST]
      +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 replicas)
            +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
               +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2 AS _2#7]
                  +- *(1) MapElements org.apache.spark.sql.DataFrameSuite$$Lambda$1518/1928028672@4d7a64ca, obj#5: scala.Tuple2
                     +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: java.lang.Long
                        +- *(1) Range (0, 10, step=1, splits=2)

       

       

      Attachments

        Activity

          People

            zhuml Mingliang Zhu
            zhuml Mingliang Zhu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: