Hive
  1. Hive
  2. HIVE-31

Hive: support CREATE TABLE xxx SELECT yyy.* FROM yyy

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.5.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      We should allow users to create a table using query result, without specifying the column names and column types.

      1. HIVE-31.patch
        170 kB
        Ning Zhang
      2. HIVE-31_4.patch
        171 kB
        Ning Zhang
      3. HIVE-31_3.patch
        183 kB
        Ning Zhang
      4. HIVE-31_2.patch
        182 kB
        Ning Zhang

        Issue Links

          Activity

          Zheng Shao created issue -
          Owen O'Malley made changes -
          Field Original Value New Value
          Key HADOOP-4535 HIVE-31
          Component/s contrib/hive [ 12312455 ]
          Issue Type New Feature [ 2 ] Bug [ 1 ]
          Project Hadoop Core [ 12310240 ] Hadoop Hive [ 12310843 ]
          Hide
          David Phillips added a comment -

          This is commonly known as "CREATE TABLE AS SELECT" or "CTAS". Do we want to require the "AS" keyword?

          MySQL: optional
          Oracle: required
          PostgreSQL: required
          SQLite: required

          Show
          David Phillips added a comment - This is commonly known as "CREATE TABLE AS SELECT" or "CTAS". Do we want to require the "AS" keyword? MySQL: optional Oracle: required PostgreSQL: required SQLite: required
          Ashish Thusoo made changes -
          Component/s Query Processor [ 12312586 ]
          Hide
          Adam Kramer added a comment -

          Similar request would be a straightforward LIKE command:

          CREATE TABLE new_table LIKE preexisting_table;

          ...I've been meaning to request this functionality for a while.

          Comment 2: I think it would be very useful to ensure that we can create temporary tables this way, too: CTTAS.

          Show
          Adam Kramer added a comment - Similar request would be a straightforward LIKE command: CREATE TABLE new_table LIKE preexisting_table; ...I've been meaning to request this functionality for a while. Comment 2: I think it would be very useful to ensure that we can create temporary tables this way, too: CTTAS.
          Zheng Shao made changes -
          Link This issue is blocked by HIVE-153 [ HIVE-153 ]
          Hide
          Zheng Shao added a comment -

          We can create tables with good column names only when we have meaningful names from the plan.

          Show
          Zheng Shao added a comment - We can create tables with good column names only when we have meaningful names from the plan.
          Hide
          Adam Kramer added a comment -

          Well, functionally, renaming table names is easy. It would be very useful for me to have this functionality even if the names were just key, value1, value2, ..., valueN.

          And now I open a JIRA that requests ability to select columns by number...

          Show
          Adam Kramer added a comment - Well, functionally, renaming table names is easy. It would be very useful for me to have this functionality even if the names were just key, value1, value2, ..., valueN. And now I open a JIRA that requests ability to select columns by number...
          Zheng Shao made changes -
          Link This issue is blocked by HIVE-509 [ HIVE-509 ]
          Ning Zhang made changes -
          Assignee Ning Zhang [ nzhang ]
          Hide
          Edward Capriolo added a comment -

          This will be helpful. I find this is mostly a problem when dealing with columns that are partitioned. For example I may have a table

          month day   id value
          

          Where month and day are partitions I either have to remove the columns or do the multi-insert.

          +1

          Show
          Edward Capriolo added a comment - This will be helpful. I find this is mostly a problem when dealing with columns that are partitioned. For example I may have a table month day id value Where month and day are partitions I either have to remove the columns or do the multi-insert. +1
          Hide
          Ning Zhang added a comment -

          Edward, do you mean the input table is partitioned or the newly created table should also support partitions?

          Show
          Ning Zhang added a comment - Edward, do you mean the input table is partitioned or the newly created table should also support partitions?
          Hide
          Edward Capriolo added a comment -

          I don't know if anyone can solve this. I will try to replicate the situation it came up on.

          create table tableA(  url string , time string, cookies string) partitioned by (year string,month string)
          
          create table tableb(  url string , time string, cookies string) partitioned by (year string,month string)
          row format compressed
          

          INSERT INTO tableb (url,time,string)partition x,y
          from select * from tablea where year=x,month=y

          See? I needed to list every column because the select * from tableA turns a partition into a column. In my case both tables are partitioned in the same way. I am not sure if this Jira can deal with this directly but I found this fact annoying.

          Show
          Edward Capriolo added a comment - I don't know if anyone can solve this. I will try to replicate the situation it came up on. create table tableA( url string , time string, cookies string) partitioned by (year string,month string) create table tableb( url string , time string, cookies string) partitioned by (year string,month string) row format compressed INSERT INTO tableb (url,time,string)partition x,y from select * from tablea where year=x,month=y See? I needed to list every column because the select * from tableA turns a partition into a column. In my case both tables are partitioned in the same way. I am not sure if this Jira can deal with this directly but I found this fact annoying.
          Hide
          Zheng Shao added a comment -

          Edward,

          This should solve your problem:

          INSERT OVERWRITE tableB PARTITION(year=xxx, month=yyy)
          SELECT `(year|month)?+.+` FROM tableA
          WHERE year = xxx and month = yyy;
          
          Show
          Zheng Shao added a comment - Edward, This should solve your problem: INSERT OVERWRITE tableB PARTITION(year=xxx, month=yyy) SELECT `(year|month)?+.+` FROM tableA WHERE year = xxx and month = yyy;
          Hide
          Edward Capriolo added a comment -

          Interesting. I figured the regex select would help, however I have some week regex skills. Thanks Zheng

          Show
          Edward Capriolo added a comment - Interesting. I figured the regex select would help, however I have some week regex skills. Thanks Zheng
          Hide
          Ning Zhang added a comment -

          Attaching HIVE-31.patch.

          This patch implements the CTAS feature:

          1) The select statement could be any query we support right now.
          2) The created table and and source tables could have different SerDe and/or Storage formats.
          3) the schema of the created table is detected from the select statement. User don't need to (and should not) specify that.

          Limitations:
          1) Created table does not support partition now. There is another work on automatic partitioning for insert statement. That will solve this problem.
          2) Create table could not be an external table.

          Show
          Ning Zhang added a comment - Attaching HIVE-31 .patch. This patch implements the CTAS feature: 1) The select statement could be any query we support right now. 2) The created table and and source tables could have different SerDe and/or Storage formats. 3) the schema of the created table is detected from the select statement. User don't need to (and should not) specify that. Limitations: 1) Created table does not support partition now. There is another work on automatic partitioning for insert statement. That will solve this problem. 2) Create table could not be an external table.
          Ning Zhang made changes -
          Attachment HIVE-31.patch [ 12420813 ]
          Hide
          Ning Zhang added a comment -

          Namit's comments:
          ====

          Due to jira connection problems (and losing them twice), sending them by mail.
          You can copy in the jira when it is stable:

          Looks good - some minor comments.

          Tests:

          1. Negative: exit at first error, so have a separate file for each case.
          2. Positive: Output not deterministic
          Add describe also.

          ErrorMsg.java:

          CTAS_COLLST_COEXISTENCE("Create table as select command cannot specify the list of columns for the target table."),
          CTLT_COLLST_COEXISTENCE("Create table like command cannot specify the list of columns for the target table."),

          Why not ? If it is not supported currently, but you want to add it in a follow-up, can you file a jira for that ?

          SemanticAnalyzerFactory.java:

          This affects the ReplicationHook which is used for logging statements.
          For CTAS, the output should be the final table which is created - currently, it is the directory – needs
          to be investigated

          SemanticAnalyzer.java:

          4370 if ( ! oneLoadFile )

          { 4371 // should not have more than one files to load 4372 throw new SemanticException(ErrorMsg.CTAS_MULTI_LOADFILE.getMsg()); 4373 }

          The above error is not really a exception - it can be a assert

          Currently, for non CTAS create tables, root tasks are created by analyzeCreateTable, whereas for CTAS,
          it is created later on – can you file a new jira for this cleanup.

          Show
          Ning Zhang added a comment - Namit's comments: ==== Due to jira connection problems (and losing them twice), sending them by mail. You can copy in the jira when it is stable: Looks good - some minor comments. Tests: 1. Negative: exit at first error, so have a separate file for each case. 2. Positive: Output not deterministic Add describe also. ErrorMsg.java: CTAS_COLLST_COEXISTENCE("Create table as select command cannot specify the list of columns for the target table."), CTLT_COLLST_COEXISTENCE("Create table like command cannot specify the list of columns for the target table."), Why not ? If it is not supported currently, but you want to add it in a follow-up, can you file a jira for that ? SemanticAnalyzerFactory.java: This affects the ReplicationHook which is used for logging statements. For CTAS, the output should be the final table which is created - currently, it is the directory – needs to be investigated SemanticAnalyzer.java: 4370 if ( ! oneLoadFile ) { 4371 // should not have more than one files to load 4372 throw new SemanticException(ErrorMsg.CTAS_MULTI_LOADFILE.getMsg()); 4373 } The above error is not really a exception - it can be a assert Currently, for non CTAS create tables, root tasks are created by analyzeCreateTable, whereas for CTAS, it is created later on – can you file a new jira for this cleanup.
          Hide
          Prasad Chakka added a comment -

          I don't understand why analyzeCreateTable() is moved to SemanticAnalyzer.java. That file is already very big. Can you not call analyzeInternal() from DDLSemanticAnalyzer?

          Show
          Prasad Chakka added a comment - I don't understand why analyzeCreateTable() is moved to SemanticAnalyzer.java. That file is already very big. Can you not call analyzeInternal() from DDLSemanticAnalyzer?
          Hide
          Ning Zhang added a comment -

          Chatted with Prasad offline. The decision to move the task of DDLSemanticAnalyzer to SemanticAnalyzer is to have a single SemanticAnalyzer instance for the whole CTAS statement (otherwise, a SemanticsAnalyzer is needed for the SELECT part to get the resulting schema, and a DDLSemanticAnalyzer for the CREATE TABLE part). With one instance of SA, all info is shared and don't need to pass around.

          Another issue Prasad raised is that the DDLWork is chained as the root task, and that should be moved to the end of the task chain. I will change that in the next revision.

          Show
          Ning Zhang added a comment - Chatted with Prasad offline. The decision to move the task of DDLSemanticAnalyzer to SemanticAnalyzer is to have a single SemanticAnalyzer instance for the whole CTAS statement (otherwise, a SemanticsAnalyzer is needed for the SELECT part to get the resulting schema, and a DDLSemanticAnalyzer for the CREATE TABLE part). With one instance of SA, all info is shared and don't need to pass around. Another issue Prasad raised is that the DDLWork is chained as the root task, and that should be moved to the end of the task chain. I will change that in the next revision.
          Hide
          Ning Zhang added a comment -

          fixes issues raised by Namit and Prasad.
          1) output file is the table only in the posthook.
          2) DDLwork is moved to the end of the task chain.

          Show
          Ning Zhang added a comment - fixes issues raised by Namit and Prasad. 1) output file is the table only in the posthook. 2) DDLwork is moved to the end of the task chain.
          Ning Zhang made changes -
          Attachment HIVE-31_2.patch [ 12420931 ]
          Hide
          Ning Zhang added a comment -

          Additional changes in HIVE-31_2.patch:

          3) added sort by and explain in the clientpositive tests.
          4) removed the if-not-exists in clientnegative since it is already covered in clientpositive.
          5) SemanticAnalyzer.java:4370 changed to assert rather than throw an exception.

          Show
          Ning Zhang added a comment - Additional changes in HIVE-31 _2.patch: 3) added sort by and explain in the clientpositive tests. 4) removed the if-not-exists in clientnegative since it is already covered in clientpositive. 5) SemanticAnalyzer.java:4370 changed to assert rather than throw an exception.
          Hide
          Ning Zhang added a comment -

          This patch incorporated Prasad's comment about early exception when the target table exists. Prasad, please look at this patch as the previous one just return and won't throw an exception.

          Show
          Ning Zhang added a comment - This patch incorporated Prasad's comment about early exception when the target table exists. Prasad, please look at this patch as the previous one just return and won't throw an exception.
          Ning Zhang made changes -
          Attachment HIVE-31_3.patch [ 12420973 ]
          Hide
          Namit Jain added a comment -

          Ning, can you generate the patch again - there were many many conflicts when I tried to apply the patch.

          Show
          Namit Jain added a comment - Ning, can you generate the patch again - there were many many conflicts when I tried to apply the patch.
          Hide
          Ning Zhang added a comment -

          Namit, HIVE-31_4.patch is compatible with the latest revision (820742)

          Show
          Ning Zhang added a comment - Namit, HIVE-31 _4.patch is compatible with the latest revision (820742)
          Ning Zhang made changes -
          Attachment HIVE-31_4.patch [ 12421041 ]
          Hide
          Namit Jain added a comment -

          +1

          looks good

          Show
          Namit Jain added a comment - +1 looks good
          Hide
          Namit Jain added a comment -

          Committed. Thanks Ning

          Show
          Namit Jain added a comment - Committed. Thanks Ning
          Namit Jain made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Hadoop Flags [Reviewed]
          Fix Version/s 0.5.0 [ 12314156 ]
          Resolution Fixed [ 1 ]
          Hide
          Carl Steinbach added a comment -

          This was accidentally committed as HIVE-861:

          % git log --stat c795b31e70 -n 1
          commit c795b31e7009d6f95018c9a2fcbdff0b0ce26591
          Author: Namit Jain <namit@apache.org>
          Date:   Thu Oct 1 20:06:57 2009 +0000
          
              HIVE-861. NumberFormatException in sum and average
              (Zheng Shao via namit)
              
              
              
              git-svn-id: https://svn.apache.org/repos/asf/hadoop/hive/trunk@820771 13f79535-47bb-0310-9956-ffa450edef68
          
           CHANGES.txt                                        |    3 +
           .../org/apache/hadoop/hive/ql/exec/ColumnInfo.java |   11 +
           .../hadoop/hive/ql/parse/BaseSemanticAnalyzer.java |   90 ++++
           .../hadoop/hive/ql/parse/DDLSemanticAnalyzer.java  |  339 +-------------
           .../org/apache/hadoop/hive/ql/parse/ErrorMsg.java  |   12 +-
           ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g |   29 +-
           .../java/org/apache/hadoop/hive/ql/parse/QB.java   |   19 +-
           .../hadoop/hive/ql/parse/SemanticAnalyzer.java     |  523 ++++++++++++++++++--
           .../hive/ql/parse/SemanticAnalyzerFactory.java     |    3 +-
           .../org/apache/hadoop/hive/ql/plan/PlanUtils.java  |   45 ++-
           ql/src/test/queries/clientnegative/ctas.q          |    5 +
           ql/src/test/queries/clientpositive/ctas.q          |   34 ++
           ql/src/test/results/clientnegative/ctas.q.out      |    5 +
           ql/src/test/results/clientpositive/create_1.q.out  |    7 +-
           .../test/results/clientpositive/create_like.q.out  |   16 +-
           ql/src/test/results/clientpositive/ctas.q.out      |  421 ++++++++++++++++
           ql/src/test/results/compiler/plan/cast1.q.xml      |   29 +-
           ql/src/test/results/compiler/plan/groupby2.q.xml   |   17 +-
           ql/src/test/results/compiler/plan/groupby3.q.xml   |   23 +-
           ql/src/test/results/compiler/plan/groupby4.q.xml   |   11 +-
           ql/src/test/results/compiler/plan/groupby5.q.xml   |   14 +-
           ql/src/test/results/compiler/plan/groupby6.q.xml   |   11 +-
           ql/src/test/results/compiler/plan/input20.q.xml    |   14 +-
           ql/src/test/results/compiler/plan/input3.q.xml     |    3 +
           ql/src/test/results/compiler/plan/input8.q.xml     |   17 +-
           .../test/results/compiler/plan/input_part1.q.xml   |   20 +-
           .../results/compiler/plan/input_testxpath.q.xml    |   17 +-
           .../results/compiler/plan/input_testxpath2.q.xml   |   17 +-
           ql/src/test/results/compiler/plan/join4.q.xml      |   12 +
           ql/src/test/results/compiler/plan/join5.q.xml      |   12 +
           ql/src/test/results/compiler/plan/join6.q.xml      |   12 +
           ql/src/test/results/compiler/plan/join7.q.xml      |   18 +
           ql/src/test/results/compiler/plan/join8.q.xml      |   12 +
           ql/src/test/results/compiler/plan/sample1.q.xml    |   20 +-
           ql/src/test/results/compiler/plan/subq.q.xml       |    6 +
           ql/src/test/results/compiler/plan/udf1.q.xml       |   59 ++-
           ql/src/test/results/compiler/plan/udf4.q.xml       |   65 +++-
           ql/src/test/results/compiler/plan/udf6.q.xml       |   14 +-
           ql/src/test/results/compiler/plan/udf_case.q.xml   |   14 +-
           ql/src/test/results/compiler/plan/udf_when.q.xml   |   14 +-
           ql/src/test/results/compiler/plan/union.q.xml      |    6 +
           41 files changed, 1564 insertions(+), 455 deletions(-)
          
          Show
          Carl Steinbach added a comment - This was accidentally committed as HIVE-861 : % git log --stat c795b31e70 -n 1 commit c795b31e7009d6f95018c9a2fcbdff0b0ce26591 Author: Namit Jain <namit@apache.org> Date: Thu Oct 1 20:06:57 2009 +0000 HIVE-861. NumberFormatException in sum and average (Zheng Shao via namit) git-svn-id: https: //svn.apache.org/repos/asf/hadoop/hive/trunk@820771 13f79535-47bb-0310-9956-ffa450edef68 CHANGES.txt | 3 + .../org/apache/hadoop/hive/ql/exec/ColumnInfo.java | 11 + .../hadoop/hive/ql/parse/BaseSemanticAnalyzer.java | 90 ++++ .../hadoop/hive/ql/parse/DDLSemanticAnalyzer.java | 339 +------------- .../org/apache/hadoop/hive/ql/parse/ErrorMsg.java | 12 +- ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g | 29 +- .../java/org/apache/hadoop/hive/ql/parse/QB.java | 19 +- .../hadoop/hive/ql/parse/SemanticAnalyzer.java | 523 ++++++++++++++++++-- .../hive/ql/parse/SemanticAnalyzerFactory.java | 3 +- .../org/apache/hadoop/hive/ql/plan/PlanUtils.java | 45 ++- ql/src/test/queries/clientnegative/ctas.q | 5 + ql/src/test/queries/clientpositive/ctas.q | 34 ++ ql/src/test/results/clientnegative/ctas.q.out | 5 + ql/src/test/results/clientpositive/create_1.q.out | 7 +- .../test/results/clientpositive/create_like.q.out | 16 +- ql/src/test/results/clientpositive/ctas.q.out | 421 ++++++++++++++++ ql/src/test/results/compiler/plan/cast1.q.xml | 29 +- ql/src/test/results/compiler/plan/groupby2.q.xml | 17 +- ql/src/test/results/compiler/plan/groupby3.q.xml | 23 +- ql/src/test/results/compiler/plan/groupby4.q.xml | 11 +- ql/src/test/results/compiler/plan/groupby5.q.xml | 14 +- ql/src/test/results/compiler/plan/groupby6.q.xml | 11 +- ql/src/test/results/compiler/plan/input20.q.xml | 14 +- ql/src/test/results/compiler/plan/input3.q.xml | 3 + ql/src/test/results/compiler/plan/input8.q.xml | 17 +- .../test/results/compiler/plan/input_part1.q.xml | 20 +- .../results/compiler/plan/input_testxpath.q.xml | 17 +- .../results/compiler/plan/input_testxpath2.q.xml | 17 +- ql/src/test/results/compiler/plan/join4.q.xml | 12 + ql/src/test/results/compiler/plan/join5.q.xml | 12 + ql/src/test/results/compiler/plan/join6.q.xml | 12 + ql/src/test/results/compiler/plan/join7.q.xml | 18 + ql/src/test/results/compiler/plan/join8.q.xml | 12 + ql/src/test/results/compiler/plan/sample1.q.xml | 20 +- ql/src/test/results/compiler/plan/subq.q.xml | 6 + ql/src/test/results/compiler/plan/udf1.q.xml | 59 ++- ql/src/test/results/compiler/plan/udf4.q.xml | 65 +++- ql/src/test/results/compiler/plan/udf6.q.xml | 14 +- ql/src/test/results/compiler/plan/udf_case.q.xml | 14 +- ql/src/test/results/compiler/plan/udf_when.q.xml | 14 +- ql/src/test/results/compiler/plan/union.q.xml | 6 + 41 files changed, 1564 insertions(+), 455 deletions(-)
          Carl Steinbach made changes -
          Tags svn_id:trunk@820771
          Carl Steinbach made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Ning Zhang
              Reporter:
              Zheng Shao
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development