Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
git.commit.id.abbrev=60aa446
I ran the below test against the private branch of Jason which has some patches for bugs related to flatten which are not yet merged into the master.
The data is in such a way that each array within the record contains only 2 records. So with each flatten added to the query the no of rows should get doubled
The below query works as expected
0: jdbc:drill:schema=dfs.drillTestDir>select count(*) from (select id, flatten(evnts1), flatten(evnts2), flatten(evnts3), flatten(evnts4), flatten(evnts5), flatten(evnts6), flatten(evnts7), flatten(evnts8), flatten(evnts9), flatten(evnts10) from `json_kvgenflatten/many-arrays-50.json`) ; +------------+ | EXPR$0 | +------------+ | 1024 | +------------+
However the below query reports incorrect results. The correct output is 2048.
0: jdbc:drill:schema=dfs.drillTestDir> select count(*) from (select id, flatten(evnts1), flatten(evnts2), flatten(evnts3), flatten(evnts4), flatten(evnts5), flatten(evnts6), flatten(evnts7), flatten(evnts8), flatten(evnts9), flatten(evnts10), flatten(evnts11) from `json_kvgenflatten/many-arrays-50.json`) ; +------------+ | EXPR$0 | +------------+ | 2047 | +------------+
From here on no matter how many flattens we add to the query, the output still remains the same. However the duration of the query seems to more and more with each new flatten added.
I attached the data file. Let me know if you have any questions.