In order to improve compression and/or the effectiveness of min/max pruning, it is desirable to control the order in which rows are inserted into table (mostly for Parquet).
To that end, we should introduce a "sortby" plan hint for insert statements: Example
This would produce the following plan:
SCAN -> SORT(day,hour) -> TABLE SINK
- We will not support the legacy-hint style with brackets
- To keep the "clustered" hint strictly separate from the "sortby" hint, it is only legal to use non-partition columns in "sortby" for HDFS tables.
- Similarly, it is only legal to mention non-primary-key columns of Kudu tables.