Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7515

ORDER BY clause produce error on GROUP BY with array field manager with any_value

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.17.0
    • None
    • Execution - Data Types
    • None

    Description

      With a parquet containing an array field, for example:

      apache drill 1.17> CREATE TABLE dfs.TEST.`example_any_pqt` AS (SELECT 'foo' AS a, 'bar' b, split('foo,bar',',') as c);
      
      apache drill 1.17> SELECT *, typeof(c) AS type, sqltypeof(c) AS sql_type FROM dfs.TEST.`example_any_pqt`;
      +-----+-----+---------------+---------+----------+
      |  a  |  b  |       c       |  type   | sql_type |
      +-----+-----+---------------+---------+----------+
      | foo | bar | ["foo","bar"] | VARCHAR | ARRAY    |
      +-----+-----+---------------+---------+----------+
      

      The next request work well

      apache drill 1.17> SELECT * FROM 
      (SELECT a, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a)
      ORDER BY a;
      +-----+---------------+
      |  a  |    EXPR$1     |
      +-----+---------------+
      | foo | ["foo","bar"] |
      +-----+---------------+
      

      But the next request (with the same struct as the previous request) failed

      apache drill 1.17> SELECT * FROM 
      (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b)
      ORDER BY a;
      Error: UNSUPPORTED_OPERATION ERROR: Schema changes not supported in External Sort. Please enable Union type.
      
      Previous schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)], [`EXPR$2` (NULL:OPTIONAL)]], selectionVector=NONE]
      Incoming schema BatchSchema [fields=[[`a` (VARCHAR:OPTIONAL)], [`b` (VARCHAR:OPTIONAL)], [`EXPR$2` (VARCHAR:REPEATED), children=([`$data$` (VARCHAR:REQUIRED)])]], selectionVector=NONE]
      Fragment 0:0
      

      Note that the same request without the order by works well. It's also possible to use intermediate table and apply the ORDER BY in a second time.

      apache drill 1.17> SELECT * FROM 
      (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b);
      +-----+-----+---------------+
      |  a  |  b  |    EXPR$2     |
      +-----+-----+---------------+
      | foo | bar | ["foo","bar"] |
      +-----+-----+---------------+
      
      apache drill 1.17> CREATE TABLE dfs.TEST.`ok_pqt` AS (SELECT * FROM (SELECT a, b, any_value(c) FROM dfs.TEST.`example_any_pqt` GROUP BY a, b));
      +----------+---------------------------+
      | Fragment | Number of records written |
      +----------+---------------------------+
      | 0_0      | 1                         |
      +----------+---------------------------+
      apache drill 1.17> SELECT * FROM dfs.TEST.`ok_pqt` ORDER BY a;
      +-----+-----+---------------+
      |  a  |  b  |    EXPR$2     |
      +-----+-----+---------------+
      | foo | bar | ["foo","bar"] |
      +-----+-----+---------------+
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            benj641 benj
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: