[IMPALA-4166] Introduce SORT BY clause in CREATE TABLE statement - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Impala 2.2, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0
Fix Version/s: Impala 2.9.0
Component/s: Catalog
Labels:
- ramp-up
- usability

Epic Link:
Improve the reliability and effectiveness of ETL
Target Version:

Product Backlog

Description

This issue is intended as a usability improvement for ~~IMPALA-4163~~ where the SORT BY columns can be specified directly in the table definition like this:

CREATE TABLE t (day INT, hour INT)
PARTITIONED BY (year INT, month INT)
SORT BY (day, hour);

The above table creation has the effect that all inserts into the table have an implicit "sortby(day,hour)" plan hint applied. See ~~IMPALA-4163~~ for details on the hint.

Just like with the "sortby" hint the SORT BY clause can only contain non-partition columns for HDFS tables and non-primary key columns for Kudu tables.

This has the following benefits:

Users will not have to remember to put the sortby hint in all insert statements.
The SORT BY columns are a physical design choice, so it makes sense to store them as part of the table metadata.
This is a convenience feature. It has the same effect as the sortby() hint for INSERT statements, but doesn't require the user to remember to include the hint for every INSERT statement.

Challenges:

The Hive Metastore has no SORT BY concept, so we'll need to store the information in the generic TBLPROPERTIES map.
No other engines (Hive, Spark) will understand this table property. That means that data written by those engines will require an explicit sorting hint (as far as that's available).

Attachments

Issue Links

is required by

IMPALA-5144 Remove sortby() query hint

Resolved

relates to

IMPALA-4167 Support insert plan hints for CREATE TABLE AS SELECT

Resolved

requires

IMPALA-5339 IMPALA-4166 breaks queries on tables with sort.column that do a expr rewrite

Resolved

Sub-Tasks

Document SORT BY syntax for CREATE TABLE and ALTER TABLE

Resolved

John Russell

Activity

People

Assignee:: Lars Volker

Reporter:: Alexander Behm

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 19/Sep/16 21:36

Updated:: 01/Dec/21 07:25

Resolved:: 12/May/17 16:45