Hive
  1. Hive
  2. HIVE-6405 Support append feature for HCatalog
  3. HIVE-6406

Introduce immutable-table table property and if set, disallow insert-into

    Details

    • Release Note:
      Hide
      This patch introduces a new table property "immutable".

      If we create a table with TBLPROPERTIES("immutable"="true"), then INSERT INTO behaviour into that table will be disallowed if there is any data already present that the INSERT INTO would append to.

      INSERT INTO will still work if it is empty.

      INSERT OVERWRITE behaviour is not modified by this property, since an INSERT OVERWRITE's intent is effectively to drop and re-create.

      The desirability of having an immutable flag for a table is that it allows a table to be flagged to be protected against accidental updates due to a script loading data into it being run multiple times by mistake. With the flag set, the first insert would succeed, and successive inserts would fail, thus resulting in only one set of data in the table, instead of silently succeeding with 4 copies(say) of the data in the table.
      Show
      This patch introduces a new table property "immutable". If we create a table with TBLPROPERTIES("immutable"="true"), then INSERT INTO behaviour into that table will be disallowed if there is any data already present that the INSERT INTO would append to. INSERT INTO will still work if it is empty. INSERT OVERWRITE behaviour is not modified by this property, since an INSERT OVERWRITE's intent is effectively to drop and re-create. The desirability of having an immutable flag for a table is that it allows a table to be flagged to be protected against accidental updates due to a script loading data into it being run multiple times by mistake. With the flag set, the first insert would succeed, and successive inserts would fail, thus resulting in only one set of data in the table, instead of silently succeeding with 4 copies(say) of the data in the table.

      Description

      As part of HIVE-6405's attempt to make HCatalog and Hive behave in similar ways with regards to immutable tables, this is a companion task to introduce the notion of an immutable table, wherein all tables are not immutable by default, and have this be a table property. If this property is set for a table, and we attempt to write to a table that already has data (or a partition), disallow "INSERT INTO" into it from hive(if destination directory is non-empty). This property being set will allow hive to mimic HCatalog's current immutable-table property.

      1. HIVE-6406.patch
        22 kB
        Sushanth Sowmyan
      2. HIVE-6406.2.patch
        21 kB
        Sushanth Sowmyan
      3. HIVE-6406.3.patch
        21 kB
        Sushanth Sowmyan

        Issue Links

          Activity

          Show
          Lefty Leverenz added a comment - This is documented in the wiki in two places: DML – Inserting data into Hive tables from queries (see Synopsis after syntax) DDL – Create Table (see bullet list after syntax)
          Hide
          Lefty Leverenz added a comment -

          Thanks Sushanth, I'll put it in the wiki. (Festina lente.)

          Show
          Lefty Leverenz added a comment - Thanks Sushanth, I'll put it in the wiki. (Festina lente.)
          Hide
          Sushanth Sowmyan added a comment -

          Hi Lefty, I edited in a release note for this jira.

          Show
          Sushanth Sowmyan added a comment - Hi Lefty, I edited in a release note for this jira.
          Hide
          Lefty Leverenz added a comment -

          This needs documentation – how about a release note for starters? (Of course HIVE-6465 will also need documentation after it's committed.) Table properties are inadequately documented in the wiki – see my Feb. 14th message to dev@hive:

          Show
          Lefty Leverenz added a comment - This needs documentation – how about a release note for starters? (Of course HIVE-6465 will also need documentation after it's committed.) Table properties are inadequately documented in the wiki – see my Feb. 14th message to dev@hive: User doc for table properties
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Sushanth!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Sushanth!
          Hide
          Sushanth Sowmyan added a comment -

          Thanks, Ashutosh. I've created HIVE-6465 for that.

          Show
          Sushanth Sowmyan added a comment - Thanks, Ashutosh. I've created HIVE-6465 for that.
          Hide
          Ashutosh Chauhan added a comment -

          +1
          Since this a new protection mode, in addition to existing ones (like NO_DROP, OFFLINE) it make sense to have this new mode supported via syntax like earlier. Thats only a syntactic sugar, which could be done in a follow-up.

          Show
          Ashutosh Chauhan added a comment - +1 Since this a new protection mode, in addition to existing ones (like NO_DROP, OFFLINE) it make sense to have this new mode supported via syntax like earlier. Thats only a syntactic sugar, which could be done in a follow-up.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12629134/HIVE-6406.3.patch

          ERROR: -1 due to 2 failed/errored test(s), 5122 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
          org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1341/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1341/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12629134

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12629134/HIVE-6406.3.patch ERROR: -1 due to 2 failed/errored test(s), 5122 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1341/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1341/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12629134
          Hide
          Sushanth Sowmyan added a comment -

          /doh. Thanks Lefty.

          Updated patch with fix.

          Show
          Sushanth Sowmyan added a comment - /doh. Thanks Lefty. Updated patch with fix.
          Hide
          Lefty Leverenz added a comment -

          Here's a Valentine nit for you in ErrorMsg.java:

          INSERT_INTO_IMMUTABLE_TABLE(10256, "Inserting into an non-empty immutable table is not allowed"),

          ... should be "a non-empty" (also fix in insert_into5.q.out & insert_into6.q.out).

          And another in hive_metastore.thrift:

          +// Whether or not the table is considered immutable - immutable tables can only be
          +// overwritten or created if unpartitioned, or if partitioned, partitions inside it
          +// can only be overwritten or created. Immutability supports write-once and replace
          +// semantics, but not append.

          ... "immutable tables" doesn't match "inside it" so change to "an immutable table" (or "them").

          Show
          Lefty Leverenz added a comment - Here's a Valentine nit for you in ErrorMsg.java: INSERT_INTO_IMMUTABLE_TABLE(10256, "Inserting into an non-empty immutable table is not allowed"), ... should be "a non-empty" (also fix in insert_into5.q.out & insert_into6.q.out). And another in hive_metastore.thrift: +// Whether or not the table is considered immutable - immutable tables can only be +// overwritten or created if unpartitioned, or if partitioned, partitions inside it +// can only be overwritten or created. Immutability supports write-once and replace +// semantics, but not append. ... "immutable tables" doesn't match "inside it" so change to "an immutable table" (or "them").
          Hide
          Sushanth Sowmyan added a comment -

          Thejas M Nair/Ashutosh Chauhan/Brock Noland , could I please get a review of this patch?

          The error reported above is not connected to this patch.

          Show
          Sushanth Sowmyan added a comment - Thejas M Nair / Ashutosh Chauhan / Brock Noland , could I please get a review of this patch? The error reported above is not connected to this patch.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12628572/HIVE-6406.2.patch

          ERROR: -1 due to 3 failed/errored test(s), 5091 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_revoke_table_priv
          org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
          org.apache.hadoop.hive.common.type.TestDecimal128.testHighPrecisionDecimal128Multiply
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1309/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1309/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12628572

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12628572/HIVE-6406.2.patch ERROR: -1 due to 3 failed/errored test(s), 5091 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_revoke_table_priv org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20 org.apache.hadoop.hive.common.type.TestDecimal128.testHighPrecisionDecimal128Multiply Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1309/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1309/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12628572
          Hide
          Sushanth Sowmyan added a comment -

          Attached updated patch.

          Show
          Sushanth Sowmyan added a comment - Attached updated patch.
          Hide
          Sushanth Sowmyan added a comment -

          Fair enough, I'd agree. I retained the is_ because there was already a parameter called "is_archived", and I was trying to maintain style. The "_table" I didn't think about, but can be removed as well. I'll regenerate this patch with that changed.

          Show
          Sushanth Sowmyan added a comment - Fair enough, I'd agree. I retained the is_ because there was already a parameter called "is_archived", and I was trying to maintain style. The "_table" I didn't think about, but can be removed as well. I'll regenerate this patch with that changed.
          Hide
          Brock Noland added a comment -

          It seems like is_ and _table are noise?

          Show
          Brock Noland added a comment - It seems like is_ and _table are noise?
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12628308/HIVE-6406.patch

          ERROR: -1 due to 2 failed/errored test(s), 5091 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_revoke_table_priv
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
          

          Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1293/testReport
          Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1293/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 2 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12628308

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12628308/HIVE-6406.patch ERROR: -1 due to 2 failed/errored test(s), 5091 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_authorization_revoke_table_priv org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16 Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1293/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1293/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed This message is automatically generated. ATTACHMENT ID: 12628308
          Hide
          Sushanth Sowmyan added a comment -

          Attaching patch.

          This patch introduces a new table property "is_immutable_table".

          If we create a table with TBLPROPERTIES("is_immutable_table"="true"), then INSERT INTO behaviour into that table will be disallowed if there is any data already present that the INSERT INTO would append to.

          INSERT INTO will still work if it is empty.

          Show
          Sushanth Sowmyan added a comment - Attaching patch. This patch introduces a new table property "is_immutable_table". If we create a table with TBLPROPERTIES("is_immutable_table"="true"), then INSERT INTO behaviour into that table will be disallowed if there is any data already present that the INSERT INTO would append to. INSERT INTO will still work if it is empty.

            People

            • Assignee:
              Sushanth Sowmyan
              Reporter:
              Sushanth Sowmyan
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development