Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-6
Description
With the introduction of the Parquet page indexes it became desirable to have more control over how Impala writes Parquet files.
These configuration options (probably implemented as query options) would be:
- enable/disable Parquet page index writing (currently we can do it with a command-line argument)
- set page-size limits based on row count
Set truncation length for statistics about string values(current truncation length is 64, it is unlikely to have user data that needs longer truncation than that)
They'd enable writing more complete tests for page filtering. They'd be also useful for fine-tuning the page index for better performance.
Attachments
Issue Links
- is related to
-
IMPALA-8449 Avoid Parquet pages with too many rows + try to make them aligned
- Open
-
IMPALA-10405 Consider setting parquet_page_row_count_limit to 20000 by default
- Open