This issue is intended as a usability improvement for
IMPALA-4163 where the SORT BY columns can be specified directly in the table definition like this:
The above table creation has the effect that all inserts into the table have an implicit "sortby(day,hour)" plan hint applied. See
IMPALA-4163 for details on the hint.
Just like with the "sortby" hint the SORT BY clause can only contain non-partition columns for HDFS tables and non-primary key columns for Kudu tables.
This has the following benefits:
- Users will not have to remember to put the sortby hint in all insert statements.
- The SORT BY columns are a physical design choice, so it makes sense to store them as part of the table metadata.
- This is a convenience feature. It has the same effect as the sortby() hint for INSERT statements, but doesn't require the user to remember to include the hint for every INSERT statement.
- The Hive Metastore has no SORT BY concept, so we'll need to store the information in the generic TBLPROPERTIES map.
- No other engines (Hive, Spark) will understand this table property. That means that data written by those engines will require an explicit sorting hint (as far as that's available).