Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-12989

Bad interaction between StarExpansion and ExtractWindowExpressions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.1, 2.0.0
    • SQL
    • None

    Description

      Reported initially here: http://stackoverflow.com/questions/34995376/apache-spark-window-function-with-nested-column

      import sqlContext.implicits._
      import org.apache.spark.sql.functions._
      import org.apache.spark.sql.expressions.Window
      
      sql("SET spark.sql.eagerAnalysis=false") // Let us see the error even though we are constructing an invalid tree
      
      val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", "num")
        .withColumn("Data", struct("A", "B", "C"))
        .drop("A")
        .drop("B")
        .drop("C")
      
      val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc)
      data.select($"*", max("num").over(winSpec) as "max").explain(true)
      

      When you run this, the analyzer inserts invalid columns into a projection, as seen below:

      == Parsed Logical Plan ==
      'Project [*,'max('num) windowspecdefinition('Data.A,'Data.B,'num DESC,UnspecifiedFrame) AS max#64928]
      +- Project [num#64926,Data#64927]
         +- Project [C#64925,num#64926,Data#64927]
            +- Project [B#64924,C#64925,num#64926,Data#64927]
               +- Project [A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS Data#64927]
                  +- Project [_1#64919 AS A#64923,_2#64920 AS B#64924,_3#64921 AS C#64925,_4#64922 AS num#64926]
                     +- LocalRelation [_1#64919,_2#64920,_3#64921,_4#64922], [[a,b,c,3],[c,b,a,3]]
      
      == Analyzed Logical Plan ==
      num: int, Data: struct<A:string,B:string,C:string>, max: int
      Project [num#64926,Data#64927,max#64928]
      +- Project [num#64926,Data#64927,A#64932,B#64933,max#64928,max#64928]
         +- Window [num#64926,Data#64927,A#64932,B#64933], [HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax(num#64926) windowspecdefinition(A#64932,B#64933,num#64926 DESC,RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS max#64928], [A#64932,B#64933], [num#64926 DESC]
            +- !Project [num#64926,Data#64927,A#64932,B#64933]
               +- Project [num#64926,Data#64927]
                  +- Project [C#64925,num#64926,Data#64927]
                     +- Project [B#64924,C#64925,num#64926,Data#64927]
                        +- Project [A#64923,B#64924,C#64925,num#64926,struct(A#64923,B#64924,C#64925) AS Data#64927]
                           +- Project [_1#64919 AS A#64923,_2#64920 AS B#64924,_3#64921 AS C#64925,_4#64922 AS num#64926]
                              +- LocalRelation [_1#64919,_2#64920,_3#64921,_4#64922], [[a,b,c,3],[c,b,a,3]]
      

      Attachments

        Activity

          People

            smilegator Xiao Li
            marmbrus Michael Armbrust
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: