Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7934

Improve column level encryption with key management

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      Now HIVE-6329 is a framework of column level encryption/decryption. But the implementation in HIVE-6329 is just use Base64, it is not safe and have some problems:

      • Base64WriteOnly just be able to get the ciphertext from client for any users.
      • Base64Rewriter just be able to get plaintext from client for any users.

      I have an improvement based on HIVE-6329 using key management via kms.
      This patch implement transparent column level encryption. Users don't need to set anything when they quey tables.

      1. setup kms and set kms-acls.xml (e.g. user1 and root has permission to get key)
         <property>
            <name>hadoop.kms.acl.GET</name>
            <value>user1 root</value>
            <description>
              ACL for get-key-version and get-current-key operations.
            </description>
          </property>
        
      2. set hive-site.xml
         <property>  
            <name>hadoop.security.key.provider.path</name>  
            <value>kms://http@localhost:16000/kms</value>  
         </property> 
        
      3. create an encrypted table
        drop table student_column_encrypt;
        create table student_column_encrypt (s_key INT, s_name STRING, s_country STRING, s_age INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
          WITH SERDEPROPERTIES ('column.encode.columns'='s_country,s_age', 'column.encode.classname'='org.apache.hadoop.hive.serde2.crypto.CryptoRewriter') 
          STORED AS TEXTFILE TBLPROPERTIES('hive.encrypt.keynames'='hive.k1');
        insert overwrite table student_column_encrypt 
        select 
          s_key, s_name, s_country, s_age
        from student;
                     
        select * from student_column_encrypt; 
        
      4. query table by different user, this is transparent to users. It is very convenient and don't need to set anything.
        [root@huang1 hive_data]# hive
        hive> select * from student_column_encrypt;       
        OK
        0	Armon	China	20
        1	Jack	USA	21
        2	Lucy	England	22
        3	Lily	France	23
        4	Yom	Spain	24
        Time taken: 0.759 seconds, Fetched: 5 row(s)
        
        [root@huang1 hive_data]# su user2
        [user2@huang1 hive_data]$ hive
        hive> select * from student_column_encrypt;
        OK
        0	Armon	dqyb188=	NULL
        1	Jack	YJez	NULL
        2	Lucy	cKqV1c8MTw==	NULL
        3	Lily	c7aT180H	NULL
        4	Yom	ZrST0MA=	NULL
        Time taken: 0.77 seconds, Fetched: 5 row(s)
        

      Attachments

        Issue Links

          Activity

            People

              Huang Xiaomeng Xiaomeng Huang
              Huang Xiaomeng Xiaomeng Huang
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: