Uploaded image for project: 'IMPALA'
  2. IMPALA-4166

Introduce SORT BY clause in CREATE TABLE statement



    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.2, Impala 2.3.0, Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0
    • Fix Version/s: Impala 2.9.0
    • Component/s: Catalog
    • Labels:


      This issue is intended as a usability improvement for IMPALA-4163 where the SORT BY columns can be specified directly in the table definition like this:

      CREATE TABLE t (day INT, hour INT)
      PARTITIONED BY (year INT, month INT)
      SORT BY (day, hour);

      The above table creation has the effect that all inserts into the table have an implicit "sortby(day,hour)" plan hint applied. See IMPALA-4163 for details on the hint.

      Just like with the "sortby" hint the SORT BY clause can only contain non-partition columns for HDFS tables and non-primary key columns for Kudu tables.

      This has the following benefits:

      • Users will not have to remember to put the sortby hint in all insert statements.
      • The SORT BY columns are a physical design choice, so it makes sense to store them as part of the table metadata.
      • This is a convenience feature. It has the same effect as the sortby() hint for INSERT statements, but doesn't require the user to remember to include the hint for every INSERT statement.


      • The Hive Metastore has no SORT BY concept, so we'll need to store the information in the generic TBLPROPERTIES map.
      • No other engines (Hive, Spark) will understand this table property. That means that data written by those engines will require an explicit sorting hint (as far as that's available).


          Issue Links



              • Assignee:
                lv Lars Volker
                alex.behm Alexander Behm
              • Votes:
                0 Vote for this issue
                6 Start watching this issue


                • Created: