Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Parquet already provides per-column statistics, including the number of null values, min, and max value. It seems to be easily integrated with Tajo's statistic system. Also, if query execution exploits it well, query performance would be improved.
- Added statistics to Parquet pages and rowGroups (https://github.com/Parquet/parquet-mr/commit/621cf4e92be3dd3f2dd1a92a8dd12f244a7d7be3)
- https://git-wip-us.apache.org/repos/asf?p=incubator-parquet-mr.git;a=blob;f=parquet-column/src/main/java/parquet/column/statistics/Statistics.java;h=2c5ac14a262c69a9a6545243330bbc3a77812c9b;hb=HEAD