XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.16.0
    • None
    • Functions - Drill
    • None

    Description

      sample file: example.parquet

      With data as:

      SELECT id, tags FROM ....`example_parquet`;
      +--------+------------------------------------+
      |   id   |                tags                |
      +--------+------------------------------------+
      | 7b8808 | ["peexe","signed","overlay"]       |
      | 55a4ae | ["peexe","signed","upx","overlay"] |
      +--------+------------------------------------+
      

      The next request is OK

      SELECT id, flatten(tags) tag 
      FROM ( 
        SELECT id, any_value(tags) tags 
        FROM ....`example_parquet` 
        GROUP BY id 
      ) LIMIT 2;
      +--------+--------+
      |   id   |  tag   |
      +--------+--------+
      | 55a4ae | peexe  |
      | 55a4ae | signed |
      +--------+--------+
      

      But unexpectedly, the next query failed:

      SELECT tag, count(*) 
      FROM (
        SELECT flatten(tags) tag 
        FROM (
          SELECT id, any_value(tags) tags 
          FROM ....`example_parquet`
          GROUP BY id 
        )
      ) GROUP BY tag;
      Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REPEATED and MAP:REPEATED.
      
      /* Or other error with another set of data :
      Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:
       
      Error in expression at index 0.  Error: Missing function implementation: [hash32asdouble(MAP-REPEATED, INT-REQUIRED)].  Full expression: null..
      */
      

      These errors are incomprehensible because, the aggregate is on VARCHAR.

      More, the request works if decomposed in 2 request with with the creation of an intermediate table like below:

      CREATE TABLE ....`tmp.parquet` AS (
        SELECT id, flatten(tags) tag 
        FROM ( 
          SELECT id, any_value(tags) tags 
          FROM ....`example_parquet` 
          GROUP BY id 
      ));
      
      SELECT tag, count(*) c FROM ....`tmp_parquet` GROUP BY tag;
      +---------+---+
      |   tag   | c |
      +---------+---+
      | overlay | 2 |
      | peexe   | 2 |
      | signed  | 2 |
      | upx     | 1 |
      +---------+---+
      

      Attachments

        1. example.parquet
          0.5 kB
          benj

        Activity

          People

            Unassigned Unassigned
            benj641 benj
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: