Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7499

sqltypeof() function with an array returns "ARRAY", not type

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.16.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The sqltypeof() function was introduced in Drill 1.14 to work around limitations of the original typeof() function. The function is mentioned in Learning Apache Drill, Chapter 8, page 152:

      ELECT sqlTypeOf(columns) AS cols_type,
             modeOf(columns) AS cols_mode
      FROM `csv/cust.csv` LIMIT 1;
      
      +--------------------+------------+
      |     cols_type      | cols_mode  |
      +--------------------+------------+
      | CHARACTER VARYING  | ARRAY      |
      +--------------------+------------+
      

      When the same query is run against the just-released Drill 1.17, we get the wrong results:

      +-----------+-----------+
      | cols_type | cols_mode |
      +-----------+-----------+
      | ARRAY     | ARRAY     |
      +-----------+-----------+
      

      The definition of sqlTypeOf() is that it should return the type portion of the columns (type, mode) major type. Clearly, it is no longer doing so for arrays. As a result, there is no function to obtain the data type for arrays.

      The problem also shows up in the query from page 158:

      SELECT a, b,
             sqlTypeOf(b) AS b_type, modeof(b) AS b_mode
      FROM `gen/70kmissing.json`
      WHERE mod(a, 70000) = 1;
      

      Expected (table from the book with Drill 1.14 results):

      +--------+-------+----------+-----------+
      |   a    |   b   |  b_type  |  b_mode   |
      +--------+-------+----------+-----------+
      | 1      | null  | INTEGER  | NULLABLE  |
      +--------+-------+----------+-----------+
      

      Actual Drill 1.17 results:

      +-------+-----------+-------------------+----------+
      |   a   |     b     |      b_type       |  b_mode  |
      +-------+-----------+-------------------+----------+
      | 1     | null      | NULL              | NULLABLE |
      +-------+-----------+-------------------+----------+
      

      (Second line of table is omitted because something else changed, not relevant to this ticket.)

      The above might not actually be a bug, however if someone has changed the type of missing columns from the old INT to a newer (untyped) NULL. But, an indirect test suggests that the column is still `INT` and the function is wrong:

      SELECT a, b
      FROM `gen/70kdouble.json`
      WHERE b IS NOT NULL ORDER BY a;
      

      Data:

      {a: 1}
      ...
      {a: 69999}
      {a: 70001, b: 10.5}
      

      Error:

      Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type.
      
      Previous schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` (INT:OPTIONAL)]], selectionVector=NONE]
      Incoming schema BatchSchema [fields=[[`a` (BIGINT:OPTIONAL)], [`b` (FLOAT8:OPTIONAL)]], selectionVector=NONE]
      
      

      Oddly, however, the query on page 160 works as expected:

      SELECT sqlTypeOf(a) AS a_type, modeOf(a) AS a_mode 
      FROM `json/all-null.json` LIMIT 1;
      
      +---------+----------+
      | a_type  |  a_mode  |
      +---------+----------+
      | INTEGER | NULLABLE |
      +---------+----------+
      

       Someone will have to do some investigating to understand the current behaviour.

        Attachments

          Activity

            People

            • Assignee:
              Paul.Rogers Paul Rogers
              Reporter:
              Paul.Rogers Paul Rogers
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: