Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The integration tests for Parquet predicate push down (PPD) use the following query to validate the values filtered:
select sum(hash(*)) from ...
It would be better if we use select * from ... instead to see that those values are correct. It is difficult to see if a value was filtered by seeing the hash.
Also, we can try to limit the number of rows of the INSERT ... SELECT statmenet to avoid displaying many rows when validating the data. I think a LIMIT 2 on each of the SELECT.
For example, the parquet_ppd_boolean.ppd has this:
insert overwrite table newtypestbl select * from (select cast("apple" as char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 union all select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, false from src src2) uniontbl;
If we use LIMIT 2, then we will reduce the # of rows:
insert overwrite table newtypestbl select * from (select cast("apple" as char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 LIMIT 2 union all select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, false from src src2 LIMIT 2) uniontbl;