XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.16.0
    • Fix Version/s: None
    • Component/s: Functions - Drill
    • Labels:
      None

      Description

      With data as:

      SELECT id, tags FROM ....`example_parquet`;
      +--------+------------------------------------+
      |   id   |                tags                |
      +--------+------------------------------------+
      | 7b8808 | ["peexe","signed","overlay"]       |
      | 55a4ae | ["peexe","signed","upx","overlay"] |
      +--------+------------------------------------+
      

      The next request is OK

      SELECT id, flatten(tags) tag 
      FROM ( 
        SELECT id, any_value(tags) tags 
        FROM ....`example_parquet` 
        GROUP BY id 
      ) LIMIT 2;
      +--------+--------+
      |   id   |  tag   |
      +--------+--------+
      | 55a4ae | peexe  |
      | 55a4ae | signed |
      +--------+--------+
      

      But unexpectedly, the next query failed:

      SELECT tag, count(*) 
      FROM (
        SELECT flatten(tags) tag 
        FROM (
          SELECT id, any_value(tags) tags 
          FROM ....`example_parquet`
          GROUP BY id 
        )
      ) GROUP BY tag;
      Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REPEATED and MAP:REPEATED.
      
      /* Or other error with another set of data :
      Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:
       
      Error in expression at index 0.  Error: Missing function implementation: [hash32asdouble(MAP-REPEATED, INT-REQUIRED)].  Full expression: null..
      */
      

      These errors are incomprehensible because, the aggregate is on VARCHAR.

      More, the request works if decomposed in 2 request with with the creation of an intermediate table like below:

      CREATE TABLE ....`tmp.parquet` AS (
        SELECT id, flatten(tags) tag 
        FROM ( 
          SELECT id, any_value(tags) tags 
          FROM ....`example_parquet` 
          GROUP BY id 
      ));
      
      SELECT tag, count(*) c FROM ....`tmp_parquet` GROUP BY tag;
      +---------+---+
      |   tag   | c |
      +---------+---+
      | overlay | 2 |
      | peexe   | 2 |
      | signed  | 2 |
      | upx     | 1 |
      +---------+---+
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              benj641 benj
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: