Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.16.0
-
None
-
None
Description
sample file: example.parquet
With data as:
SELECT id, tags FROM ....`example_parquet`; +--------+------------------------------------+ | id | tags | +--------+------------------------------------+ | 7b8808 | ["peexe","signed","overlay"] | | 55a4ae | ["peexe","signed","upx","overlay"] | +--------+------------------------------------+
The next request is OK
SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM ....`example_parquet` GROUP BY id ) LIMIT 2; +--------+--------+ | id | tag | +--------+--------+ | 55a4ae | peexe | | 55a4ae | signed | +--------+--------+
But unexpectedly, the next query failed:
SELECT tag, count(*) FROM ( SELECT flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM ....`example_parquet` GROUP BY id ) ) GROUP BY tag; Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REPEATED and MAP:REPEATED. /* Or other error with another set of data : Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index 0. Error: Missing function implementation: [hash32asdouble(MAP-REPEATED, INT-REQUIRED)]. Full expression: null.. */
These errors are incomprehensible because, the aggregate is on VARCHAR.
More, the request works if decomposed in 2 request with with the creation of an intermediate table like below:
CREATE TABLE ....`tmp.parquet` AS ( SELECT id, flatten(tags) tag FROM ( SELECT id, any_value(tags) tags FROM ....`example_parquet` GROUP BY id )); SELECT tag, count(*) c FROM ....`tmp_parquet` GROUP BY tag; +---------+---+ | tag | c | +---------+---+ | overlay | 2 | | peexe | 2 | | signed | 2 | | upx | 1 | +---------+---+