-
Type:
New Feature
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: Impala 2.2, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0
-
Fix Version/s: Impala 2.9.0
-
Component/s: Catalog
-
Target Version:
This issue is intended as a usability improvement for IMPALA-4163 where the SORT BY columns can be specified directly in the table definition like this:
CREATE TABLE t (day INT, hour INT) PARTITIONED BY (year INT, month INT) SORT BY (day, hour);
The above table creation has the effect that all inserts into the table have an implicit "sortby(day,hour)" plan hint applied. See IMPALA-4163 for details on the hint.
Just like with the "sortby" hint the SORT BY clause can only contain non-partition columns for HDFS tables and non-primary key columns for Kudu tables.
This has the following benefits:
- Users will not have to remember to put the sortby hint in all insert statements.
- The SORT BY columns are a physical design choice, so it makes sense to store them as part of the table metadata.
- This is a convenience feature. It has the same effect as the sortby() hint for INSERT statements, but doesn't require the user to remember to include the hint for every INSERT statement.
Challenges:
- The Hive Metastore has no SORT BY concept, so we'll need to store the information in the generic TBLPROPERTIES map.
- No other engines (Hive, Spark) will understand this table property. That means that data written by those engines will require an explicit sorting hint (as far as that's available).
- is required by
-
IMPALA-5144 Remove sortby() query hint
-
- Resolved
-
- relates to
-
IMPALA-4167 Support insert plan hints for CREATE TABLE AS SELECT
-
- Resolved
-
- requires
-
IMPALA-5339 IMPALA-4166 breaks queries on tables with sort.column that do a expr rewrite
-
- Resolved
-