Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7934

Improve column level encryption with key management

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems:

      • Base64WriteOnly just be able to get the ciphertext from client for any users.
      • Base64Rewriter just be able to get plaintext from client for any users.

      I have an improvement based on HIVE-6329 using key management via kms.
      This patch implement transparent column level encryption. Users don't need to set anything when they quey tables.

      1. setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
         <property>
            <name>hadoop.kms.acl.GET</name>
            <value>user1 root</value>
            <description>
              ACL for get-key-version and get-current-key operations.
            </description>
          </property>
        
      2. set hive-site.xml
         <property>  
            <name>hadoop.security.key.provider.path</name>  
            <value>kms://http@localhost:16000/kms</value>  
         </property> 
        
      3. create an encrypted table
        drop table student_column_encrypt;
        create table student_column_encrypt (s_key INT, s_name STRING, s_country STRING, s_age INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
          WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter') 
          STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
        insert overwrite table student_column_encrypt 
        select 
          s_key, s_name, s_country, s_age
        from student;
                     
        select * from student_column_encrypt; 
        
      4. query table by different user, this is transparent to users. It is very convenient and don't need to set anything.
        [root@huang1 hive_data]# hive
        hive> select * from student_column_encrypt;       
        OK
        0	Armon	China	20
        1	Jack	USA	21
        2	Lucy	England	22
        3	Lily	France	23
        4	Yom	Spain	24
        Time taken: 0.759 seconds, Fetched: 5 row(s)
        
        [root@huang1 hive_data]# su user2
        [user2@huang1 hive_data]$ hive
        hive> select * from student_column_encrypt;
        OK
        0	Armon	dqyb188=	NULL
        1	Jack	YJez	NULL
        2	Lucy	cKqV1c8MTw==	NULL
        3	Lily	c7aT180H	NULL
        4	Yom	ZrST0MA=	NULL
        Time taken: 0.77 seconds, Fetched: 5 row(s)
        

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Huang Xiaomeng Xiaomeng Huang
                Reporter:
                Huang Xiaomeng Xiaomeng Huang
              • Votes:
                2 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: