Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.17.0
-
None
-
None
Description
AS drill doesn't allow to GROUP BY nor DISTINCT nor ORDER BY complex type, it may appears as a solution to use any_value aggregate function to do some works.
But some problems appears:
With a dataset of 223664 rows like:
SELECT Url, Tags FROM dfs.tmp.`data.json` LIMIT 1; +-----------------------------------------+--------+ | Url | Tags | +-----------------------------------------+--------+ | http://000.dijiushipindian.com/feed.rss | ["us"] | +-----------------------------------------+--------+
With the own UDF function to_string that only do
@Param FieldReader input;
...
String rowString = input.readObject().toString();
...
SELECT any_value(T.Tags)Tags FROM dfs.tmp.`data.json` GROUP BY NULLIF(UPPER(to_string(T.Tags)),'') /* WORK WELL */; +--------+ | Tags | +--------+ | ["us"] | | ["cn"] | ... SELECT Url, any_value(T.Tags)Tags FROM dfs.tmp.`data.json` GROUP BY Url, NULLIF(UPPER(to_string(T.Tags)),'') /* NOK */; java.lang.NegativeArraySizeException
Sometimes the error can be different (details in attachment): java.lang.IndexOutOfBoundsException: index: 1634787136, length: 7629168 (expected: range(0, 8388608))
And before producing the error, the output show some results like below
+----------------------------------------------------------------------------------+------+ | Url | Tags | +----------------------------------------------------------------------------------+------+ | http://everythiing4u.blogspot.com.es/2013/04/omg-proposal-fail.html | [] | | http://everythiing4u.blogspot.com.es/2013/04/omg-this-dude-just-owned-his-friend.html | [] |
And this result is not correct because field Tags is empty although this is never the case in the source file.
So maybe there is a problem with the aggregate function any_value.