Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Parquet group scan provides the exact row count and the exact value count for each individual column. Such information could be leveraged in the following two ways:
1. Use the count in the cost estimation, when query refers parquet files.
2. Use the row count or column value count to optimize count() aggregate function.
For instance, select count from parquet_file;
select count(column_a) from parquet_file;
First query could be transformed to return the row count directly, the second one could return the column value count for 'column_a'. Both of the two cases will avoid scan the whole parquet files, thus improve query performance.