Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10960

SQL with windowing function cannot reference column in inner select block

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0, 1.5.0
    • Fix Version/s: 1.4.2, 1.5.2, 1.6.0
    • Component/s: SQL
    • Labels:
      None

      Description

      There seems to be a bug in the Spark SQL parser when I use windowing functions. Specifically, when the SELECT refers to a column from an inner select block, the parser throws an error.

      Here is an example:
      --------------------------
      When I use a windowing function and add a '1' constant to the result,

         select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1
      

      The Spark SQL parser works. The whole SQL is:

      select Rank() OVER ( ORDER BY D1.c3 ) + 1 as c1,
                               D1.c3 as c3,
                               D1.c4 as c4,
                               D1.c5 as c5
                          from 
                               (select T3671.ROW_WID as c3,
                                         T3671.CAL_MONTH as c4,
                                         T3671.CAL_YEAR as c5,
                                         1 as c6
                                    from 
                                         W_DAY_D T3671
                               ) D1
      

      However, if I change the projection so that it refers to a column in an inner select block, D1.C6, whose value is itself a '1' literal, so it is functionally equivalent to the SQL above, Spark SQL will throw an error:

      select Rank() OVER ( ORDER BY D1.c3 ) + D1.C6 as c1,
                               D1.c3 as c3,
                               D1.c4 as c4,
                               D1.c5 as c5
                          from 
                               (select T3671.ROW_WID as c3,
                                         T3671.CAL_MONTH as c4,
                                         T3671.CAL_YEAR as c5,
                                         1 as c6
                                    from 
                                         W_DAY_D T3671
                               ) D1
      

      The error message is:

      . . . . . . . . . . . . . . . .> java.lang.NullPointerException
      Error: org.apache.spark.sql.AnalysisException: resolved attribute(s) c6#3386 missing from c5#3390
      ,c3#3383,c4#3389,_we0#3461,c3#3388 in operator !Project [c3#3388,c4#3389,c5#3390,c3#3383,_we0#346
      1,(_we0#3461 + c6#3386) AS c1#3387]; (state=,code=0)
      

      The above example is a simplified version of the SQL I was testing. The full SQL I was using, which fails with a similar error, is as follows:

      select Case when case D1.c6 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( ORDER BY case when ( case D1.c6 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c6 when 1 then D1.c3 else NULL end ) end as c1,
                               Case when case D1.c7 when 1 then D1.c3 else NULL end  is not null then Rank() OVER ( PARTITION BY D1.c4, D1.c5 ORDER BY case when ( case D1.c7 when 1 then D1.c3 else NULL end  ) is null then 1 else 0 end, case D1.c7 when 1 then D1.c3 else NULL end ) end as c2,
                               D1.c3 as c3,
                               D1.c4 as c4,
                               D1.c5 as c5
                          from 
                               (select T3671.ROW_WID as c3,
                                         T3671.CAL_MONTH as c4,
                                         T3671.CAL_YEAR as c5,
                                         ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC) as c6,
                                         ROW_NUMBER() OVER (PARTITION BY T3671.CAL_MONTH, T3671.CAL_YEAR, T3671.ROW_WID ORDER BY T3671.CAL_MONTH DESC, T3671.CAL_YEAR DESC, T3671.ROW_WID DESC) as c7
                                    from 
                                         W_DAY_D T3671
                               ) D1
      

      Hopefully when fixed, both these sample SQLs should work!

        Attachments

          Activity

            People

            • Assignee:
              viirya L. C. Hsieh
              Reporter:
              jeanlucw David Wong
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: