Hive
  1. Hive
  2. HIVE-936

dynamic partitions creation based on values

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.7.0
    • Component/s: Query Processor
    • Labels:
      None

      Description

      If a Hive table is created as partitioned, DML could only inserted into one partitioin per query. Ideally partitions should be created on the fly based on the value of the partition columns. As an example:
      {{

      { create table T (a int, b string) partitioned by (ds string); insert overwrite table T select a, b, ds from S where ds >= '2009-11-01' and ds <= '2009-11-16'; }

      }}
      should be able to execute in one DML rather than possibley 16 DML for each distinct ds values. CTAS and alter table should be able to do the same thing:
      {{

      { create table T partitioned by (ds string) as select * from S where ds >= '2009-11-01' and ds <= '2009-11-16'; }

      }}
      and
      {{

      { create table T(a int, b string, ds string); insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16'; alter table T partitioned by (ds); }

      }}
      should all return the same results.

        Issue Links

          Activity

          Hide
          Ning Zhang added a comment -

          Closing the task as the dependent tasks that worth pursue have been committed.

          Show
          Ning Zhang added a comment - Closing the task as the dependent tasks that worth pursue have been committed.
          Hide
          Ning Zhang added a comment -

          Carl, yeah, I think we can close this ticket. Some of of the dependent tasks (support CTAS) is not high-pri, and some may not have the need to pursue (support alter table). So I'll close this one now.

          Show
          Ning Zhang added a comment - Carl, yeah, I think we can close this ticket. Some of of the dependent tasks (support CTAS) is not high-pri, and some may not have the need to pursue (support alter table). So I'll close this one now.
          Hide
          Carl Steinbach added a comment -

          Can we close this ticket?

          Show
          Carl Steinbach added a comment - Can we close this ticket?
          Hide
          Ning Zhang added a comment -

          A patch for HIVE-1002 has been upload. That patch deals with insert only. Please take a look and review. I will upload a patch for CTAS on this JIRA.

          Show
          Ning Zhang added a comment - A patch for HIVE-1002 has been upload. That patch deals with insert only. Please take a look and review. I will upload a patch for CTAS on this JIRA.
          Hide
          Ning Zhang added a comment -

          Updated design notes after a group discussion.

          Show
          Ning Zhang added a comment - Updated design notes after a group discussion.
          Hide
          Ning Zhang added a comment -

          Uploading a design notes. Comments and suggestions are welcome.

          Show
          Ning Zhang added a comment - Uploading a design notes. Comments and suggestions are welcome.
          Hide
          Zheng Shao added a comment -

          Another case not covered by previous discussions (copied from HIVE-1002):

          CREATE TABLE (a STRING, b STRING, c STRING)
          PARTITIONED BY (ds STRING, ts STRING);
          
          INSERT OVERWRITE TABLE x PARTITION (ds = '2009-12-12')
          SELECT a, b, c, ts FROM xxx;
          

          Basically, allowing users to overwrite multiple partitions at a time.
          The partition values specified in PARTITION part (if any) should be a prefix of the partition keys (to simply implementation).
          The rest of the partition keys goes to the end of the SELECT expression list.

          This query will remove any existing partitions of ds = '2009-12-12' (and any ts value), and create partitions with ds = '2009-12-12' and ts of dynamical values.

          Show
          Zheng Shao added a comment - Another case not covered by previous discussions (copied from HIVE-1002 ): CREATE TABLE (a STRING, b STRING, c STRING) PARTITIONED BY (ds STRING, ts STRING); INSERT OVERWRITE TABLE x PARTITION (ds = '2009-12-12') SELECT a, b, c, ts FROM xxx; Basically, allowing users to overwrite multiple partitions at a time. The partition values specified in PARTITION part (if any) should be a prefix of the partition keys (to simply implementation). The rest of the partition keys goes to the end of the SELECT expression list. This query will remove any existing partitions of ds = '2009-12-12' (and any ts value), and create partitions with ds = '2009-12-12' and ts of dynamical values.
          Hide
          Namit Jain added a comment -

          I think, for 936, we dont need alter table - we can see if we want to support something like that in 938.
          I will copy your comments there for now.

          Show
          Namit Jain added a comment - I think, for 936, we dont need alter table - we can see if we want to support something like that in 938. I will copy your comments there for now.
          Hide
          Prasad Chakka added a comment -

          @Namit: you can't change the partition keys of a table.

          Show
          Prasad Chakka added a comment - @Namit: you can't change the partition keys of a table.
          Hide
          Namit Jain added a comment -

          @Jay, we were thinking of having a configurable variable to control the number of partitions a statement can create.

          Show
          Namit Jain added a comment - @Jay, we were thinking of having a configurable variable to control the number of partitions a statement can create.
          Hide
          Namit Jain added a comment -
          { create table T(a int, b string, ds string); insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16'; alter table T partitioned by (ds); }

          is a change of behavior - currently, DDLs never perform a data operation - let us discuss this more in a follow-up jira - maybe add
          a force option or something like that

          Show
          Namit Jain added a comment - { create table T(a int, b string, ds string); insert overwrite table T select * from S where ds >= '2009-11-1' and ds <= '2009-11-16'; alter table T partitioned by (ds); } is a change of behavior - currently, DDLs never perform a data operation - let us discuss this more in a follow-up jira - maybe add a force option or something like that
          Hide
          Namit Jain added a comment -

          I agree this is a important feature and we will start working into it

          Show
          Namit Jain added a comment - I agree this is a important feature and we will start working into it
          Hide
          Jay Booth added a comment -

          This would be super useful.. only issue I can see is inserting super sparse data, we've had issues with unbounded MultipleOutputs, eventually you run out of RAM, XCeivers or both. I suppose it could just be a client responsibility to make sure your query doesn't generate enough outputs to break it.. still better than not having the functionality at all.

          Show
          Jay Booth added a comment - This would be super useful.. only issue I can see is inserting super sparse data, we've had issues with unbounded MultipleOutputs, eventually you run out of RAM, XCeivers or both. I suppose it could just be a client responsibility to make sure your query doesn't generate enough outputs to break it.. still better than not having the functionality at all.
          Hide
          Jeff Hammerbacher added a comment -

          Probably the most requested feature from our customers. Getting a patch in would be a big usability improvement for Hive!

          Show
          Jeff Hammerbacher added a comment - Probably the most requested feature from our customers. Getting a patch in would be a big usability improvement for Hive!

            People

            • Assignee:
              Ning Zhang
              Reporter:
              Ning Zhang
            • Votes:
              8 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development