Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-8065

Support HDFS encryption functionality on Hive

Log workAgile BoardRank to TopRank to BottomVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.13.1
    • Fix Version/s: 1.1.0
    • Component/s: None
    • Labels:

      Description

      The new encryption support on HDFS makes Hive incompatible and unusable when this feature is used.

      HDFS encryption is designed so that an user can configure different encryption zones (or directories) for multi-tenant environments. An encryption zone has an exclusive encryption key, such as AES-128 or AES-256. Because of security compliance, the HDFS does not allow to move/rename files between encryption zones. Renames are allowed only inside the same encryption zone. A copy is allowed between encryption zones.

      See HDFS-6134 for more details about HDFS encryption design.

      Hive currently uses a scratch directory (like /tmp/$user/$random). This scratch directory is used for the output of intermediate data (between MR jobs) and for the final output of the hive query which is later moved to the table directory location.

      If Hive tables are in different encryption zones than the scratch directory, then Hive won't be able to renames those files/directories, and it will make Hive unusable.

      To handle this problem, we can change the scratch directory of the query/statement to be inside the same encryption zone of the table directory location. This way, the renaming process will be successful.

      Also, for statements that move files between encryption zones (i.e. LOAD DATA), a copy may be executed instead of a rename. This will cause an overhead when copying large data files, but it won't break the encryption on Hive.

      Another security thing to consider is when using joins selects. If Hive joins different tables with different encryption key strengths, then the results of the select might break the security compliance of the tables. Let's say two tables with 128 bits and 256 bits encryption are joined, then the temporary results might be stored in the 128 bits encryption zone. This will conflict with the table encrypted with 256 bits temporary.

      To fix this, Hive should be able to select the scratch directory that is more secured/encrypted in order to save the intermediate data temporary with no compliance issues.

      For instance:

      SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
      
      • This should use a scratch directory (or staging directory) inside the table-aes256 table location.
      INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
      
      • This should use a scratch directory inside the table-aes1 location.
      FROM table-unencrypted
      INSERT OVERWRITE TABLE table-aes128 SELECT id, name
      INSERT OVERWRITE TABLE table-aes256 SELECT id, name
      
      • This should use a scratch directory on each of the tables locations.
      • The first SELECT will have its scratch directory on table-aes128 directory.
      • The second SELECT will have its scratch directory on table-aes256 directory.

        Attachments

        Issue Links

        1.
        Change Hadoop version on HIVE-8065 to 2.6-SNAPSHOT Sub-task Resolved Sergio Peña Actions
        2.
        Commit initial encryption work Sub-task Resolved Sergio Peña Actions
        3.
        Create unit test JIRAs Sub-task Resolved Brock Noland Actions
        4.
        Create unit test join of encrypted and unencrypted table Sub-task Resolved Ferdinand Xu Actions
        5.
        Create unit test join of two encrypted tables with different keys Sub-task Resolved Ferdinand Xu Actions
        6.
        Create unit test where we insert into an encrypted table and then read from it with pig Sub-task Closed Ferdinand Xu Actions
        7.
        Create unit test where we insert into an encrypted table and then read from it with hcatalog mapreduce Sub-task Closed Dong Chen Actions
        8.
        Create unit test where we read from an read only encrypted table Sub-task Resolved Ferdinand Xu Actions
        9.
        Create unit test where we read from a read only unencrypted table Sub-task Resolved Ferdinand Xu Actions
        10.
        Create unit test where we insert into dynamically partitioned table Sub-task Resolved Dong Chen Actions
        11.
        Create unit test where we insert into statically partitioned table Sub-task Resolved Dong Chen Actions
        12.
        Fix permission inheritance with HDFS encryption Sub-task Resolved Szehon Ho Actions
        13.
        Create encryption testing framework Sub-task Resolved Ferdinand Xu Actions
        14.
        Hive should support multiple Key provider modes Sub-task Resolved Ferdinand Xu Actions
        15.
        Fix FileUtils.copy() method to call distcp only for HDFS files (not local files) Sub-task Resolved Sergio Peña Actions
        16.
        add createZones and listZones supports in HdfsEncryptionShim for the test purposes Sub-task Resolved Ferdinand Xu Actions
        17.
        Allow user to read encrypted read-only tables only if the scratch directory is encrypted Sub-task Resolved Sergio Peña Actions
        18.
        The move task does not handle properly in the case of loading data from the local file system path. Sub-task Resolved Ferdinand Xu Actions
        19.
        Change Hadoop version on HIVE-8065 to 2.6.0 Sub-task Resolved Sergio Peña Actions
        20.
        Fix hadoop-1 after HIVE-8864 Sub-task Resolved Szehon Ho Actions
        21.
        Revert HIVE-8604 that uses the non-implemented KeyProviderFactory.get() method. Sub-task Resolved Sergio Peña Actions
        22.
        Enhance encryption testing framework to allow create keys & zones inside .q files Sub-task Resolved Sergio Peña Actions
        23.
        Improve the mask pattern in QTestUtil to save partial directory info in test result Sub-task Resolved Dong Chen Actions
        24.
        Encryption keys deletion need to be flushed so that it updates the JKS file Sub-task Resolved Sergio Peña Actions
        25.
        We should throw an error on Hadoop23Shims.createKey() if the key already exists Sub-task Resolved Sergio Peña Actions
        26.
        Delete default encrypted databases created by TestEncryptedHDFSCliDriver Sub-task Resolved Sergio Peña Actions
        27.
        enable join test for encrypted and unencrypted tables based on the new test work flow Sub-task Resolved Ferdinand Xu Actions
        28.
        enable the unit test for inserting into dynamically partitioned table with new crypto command Sub-task Resolved Dong Chen Actions
        29.
        enable unit test for inserting into statically partitioned table with the crypto command Sub-task Resolved Dong Chen Actions
        30.
        Merge trunk into encryption branch (1/3/2015) Sub-task Resolved Brock Noland Actions
        31.
        Incompatible output format for prehook according to the encryption changes Sub-task Resolved Ferdinand Xu Actions
        32.
        Update the output files for the encryption qtests since the output format changed Sub-task Resolved Dong Chen Actions
        33.
        Merge encryption branch to trunk Sub-task Resolved Brock Noland Actions
        34.
        Hive fails when LOCATION does not beging with hdfs:// due to encryption changes Sub-task Resolved Sergio Peña Actions
        35.
        Hive with encryption throws NPE to fs path without schema Sub-task Resolved Chaoyu Tang Actions
        36.
        Exclude encryption related cases from TestCliDriver Sub-task Resolved Ferdinand Xu Actions
        37.
        TestEncryptedHDFSCliDriver get exception "Could not execute test command" for encryption test cases Sub-task Resolved Dong Chen Actions
        38.
        The Arguments of CRYPTO command is not parsed correctly in QTestUtil.executeTestCommand() Sub-task Resolved Dong Chen Actions
        39.
        Improve encryption related test cases Sub-task Resolved Dong Chen Actions
        40.
        The hdfsEncryptionShim does not handle the relative path well based on hadoop 2.6 Sub-task Resolved Ferdinand Xu Actions
        41.
        The move task doesn't work for inserting overwrite a local directory in test mode Sub-task Resolved Ferdinand Xu Actions
        42.
        Add clean up code in some encryption related test cases Sub-task Resolved Dong Chen Actions
        43.
        Only 3 encryption test cases was run. The test configuration is not correct Sub-task Resolved Dong Chen Actions
        44.
        Use metastore warehouse dir variable from conf instead of hard coded dir in encryption test Sub-task Resolved Dong Chen Actions
        45.
        Refine the logic for the isSub method to support local file in HIVE.java Sub-task Resolved Ferdinand Xu Actions
        46.
        Handle the case of insert overwrite statement with a qualified path that the destination path does not have a schema. Sub-task Resolved Ferdinand Xu Actions
        47.
        Tests cannot move files due to change on HIVE-9325 Sub-task Resolved Sergio Peña Actions
        48.
        The qtest can't handle the statements which contains semicolons in qfile Sub-task Resolved Ferdinand Xu Actions
        49.
        Enable the unit tests for the TestCommandProcessorFactory after adding crypto command for the test purpose Sub-task Resolved Ferdinand Xu Actions
        50.
        Remove the schema in the getQualifiedPathWithoutSchemeAndAuthority method Sub-task Resolved Ferdinand Xu Actions
        51.
        Remove keys in the cleanup methods for encryption related qtest Sub-task Resolved Ferdinand Xu Actions
        52.
        Fail to handle the case that a qfile contains a semicolon in the annotation Sub-task Resolved Dong Chen Actions
        53.
        smb_mapjoin_11.q needs an update post encryption work Sub-task Resolved Brock Noland Actions
        54.
        Fix TestHBaseCliDriver hbase_handler_bulk.q [Encryption Branch] Sub-task Resolved Sergio Peña Actions
        55.
        Fix merge issue with TestSymlinkTextInputFormat.testCombine [Encryption Branch] Sub-task Resolved Sergio Peña Actions
        56.
        Shim the method Path.getPathWithoutSchemeAndAuthority Sub-task Resolved Dong Chen Actions
        57.
        Encrypt mapjoin tables Sub-task Patch Available Sergio Peña Actions
        58.
        Dropping table in an encrypted zone does not drop warehouse directory Sub-task Resolved Eugene Koifman Actions
        59.
        Renaming tables across encryption zones renames table even though the operation throws error Sub-task Resolved Eugene Koifman Actions
        60.
        Insert with values clause may expose data that should be encrypted Sub-task Resolved Eugene Koifman Actions
        61.
        Fix failed qtest encryption_insert_partition_static test in Jenkin Sub-task Closed Alexander Pivovarov Actions
        62.
        Enable the cleanup of side effect for the Encryption related qfile test Sub-task Resolved Ferdinand Xu Actions
        63.
        Alter table drop partition queries in encrypted zone failing to remove data from HDFS Sub-task Resolved Eugene Koifman Actions
        64.
        Test encryption_insert_partition_static.q fails with different output results on other environments Sub-task Closed Sergio Peña Actions
        65.
        Check of fs.trash.interval in HiveMetaStore should be consistent with Trash.moveToAppropriateTrash() Sub-task Open Unassigned Actions
        66.
        Dropping a database in an encryption zone with CASCADE and trash enabled fails Sub-task Closed Sahil Takiar Actions
        67.
        update encryption_move_tbl.q when switching Hive to use Hadoop 2.7 Sub-task Resolved Eugene Koifman Actions

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

            • Assignee:
              spena Sergio Peña Assign to me
              Reporter:
              spena Sergio Peña

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment