Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3246

Query planning support for partition by clause in Drill's CTAS statement

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.1.0
    • None

    Description

      We are going to add "PARTITION BY" clause in Drill's CTAS statement. The "PARTITION BY" clause will specify the list of columns out of the result table's column list that will be used to partition the data.

      CREATE TABLE table_name [ (col_name, .... ) ]
      [PARTITION BY (col_name, ...)]
      AS SELECT_STATEMENT;

      Semantics restriction for the PARTITION BY clause:

      • All the columns in the PARTITION BY clause have to be in the table's column list, or the SELECT_STATEMENT has a * column, when the base table in the SELECT_STATEMENT is schema-less. Otherwise, an query validation error would be raised.
      • When the partition column is resolved to * column in a schema-less query, this * column could not be a result of join operation. This restriction is added, since for * out of join operation, query planner would not know which table might produce this partition column.

      Example :

      create table mytable1  partition by (r_regionkey) as 
        select r_regionkey, r_name from cp.`tpch/region.parquet`
      
      create table mytable2  partition by (r_regionkey) as 
        select * from cp.`tpch/region.parquet`
      
      create table mytable3  partition by (r_regionkey) as
        select r.r_regionkey, r.r_name, n.n_nationkey, n.n_name 
        from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
        where n.n_regionkey = r.r_regionkey
      

      Invalid case 1: Partition column is not in table's column list.

      create table mytable4  partition by (r_regionkey2) as 
        select r_regionkey, r_name from cp.`tpch/region.parquet`
      

      Invalid case 2: Partition column is resolved to * out of a join operator.

      create table mytable5  partition by (r_regionkey) as
        select * 
        from cp.`tpch/nation.parquet` n, cp.`tpch/region.parquet` r
        where n.n_regionkey = r.r_regionkey
      

      Attachments

        Activity

          People

            jni Jinfeng Ni
            jni Jinfeng Ni
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: