Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-26360 [Umbrella] Improvement for Hive Query Syntax Compatibility
  3. FLINK-28212

IndexOutOfBoundsException is thrown when project contains window which dosen't refer all fields of input when using Hive dialect

    XMLWordPrintableJSON

Details

    Description

      Can be reproduced by following sql when using Hive dialect:

      CREATE TABLE alltypesorc(
                                  ctinyint TINYINT,
                                  csmallint SMALLINT,
                                  cint INT,
                                  cbigint BIGINT,
                                  cfloat FLOAT,
                                  cdouble DOUBLE,
                                  cstring1 STRING,
                                  cstring2 STRING,
                                  ctimestamp1 TIMESTAMP,
                                  ctimestamp2 TIMESTAMP,
                                  cboolean1 BOOLEAN,
                                  cboolean2 BOOLEAN);
      
      select a.ctinyint, a.cint, count(a.cdouble)
        over(partition by a.ctinyint order by a.cint desc
          rows between 1 preceding and 1 following)
      from alltypesorc 

      Then it will throw the exception "caused by: java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)".

       

      The reson is for such sql, Hive dialect will generate a RelNode:

      LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, _o__c2])
        LogicalProject(ctinyint=[$0], cint=[$2], _o__c2=[$12])
          LogicalProject(ctinyint=[$0], csmallint=[$1], cint=[$2], cbigint=[$3], cfloat=[$4], cdouble=[$5], cstring1=[$6], cstring2=[$7], ctimestamp1=[$8], ctimestamp2=[$9], cboolean1=[$10], cboolean2=[$11], _o__col13=[COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
            LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) 

       Note: the first ProjectNode  from down to top conatins all fields.

      And as the  "1 PRECEDING AND 1 FOLLOWING"  in the window whose input will also contains all fields in the project node  will be converted to RexInputRef in Calcite. So, the window will be like 

      COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN $11 PRECEDING AND $11 FOLLOWING

      Note: `$11` is a special field for windows, which is actually recorded as window's constants.

       

      But the in rule "ProjectWindowTransposeRule", the uncesscassy field(not refered by the top project and window) will be removed,

      so the the input of the window will only contains 4 fields (ctinyint, cint, cdouble, count(cdouble)).

      Finally, in RelExplainUtil, when explain boundString, it won't find $11, so the exception "Caused by: java.lang.IndexOutOfBoundsException: index (8) must be less than size (1)" throws.

      val ref = bound.getOffset.asInstanceOf[RexInputRef]
      // ref.getIndex will be 11 but origin input size of the window is 3
      val boundIndex = ref.getIndex - calcOriginInputRows(window)
      // offset = 8, but the window's constants only contains one single element "1"
      val offset = window.constants.get(boundIndex).getValue2
      val offsetKind = if (bound.isPreceding) "PRECEDING" else "FOLLOWING"
      s"$offset $offsetKind" 

      Attachments

        Issue Links

          Activity

            People

              luoyuxia luoyuxia
              luoyuxia luoyuxia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: