Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
Impala 4.0.0
-
None
-
None
-
None
-
ghx-label-10
Description
Currently, there is a case where I LOAD DATA from hdfs configured with ranger, and the following exception occurs:
// sql LOAD DATA INPATH 'hdfs://...' OVERWRITE INTO TABLE tbl PARTITION(status='origin'); // impalad exception org.apache.impala.common.AnalysisException: Unable to LOAD DATA from hdfs://... because Impala does not have READ permissions on this file at org.apache.impala.analysis.LoadDataStmt.analyzePaths(LoadDataStmt.java:194) at org.apache.impala.analysis.LoadDataStmt.analyze(LoadDataStmt.java:122) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:491) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:451) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1736) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1702) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1672) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:164)
According to the `org.apache hadoop. Fs. FileStatus# permission`, the impalad process user does not hdfs file owner and do not have permission to read.
[hdfs@hybrid02 ~]$ hdfs dfs -ls -R /user_tag/import_staging/user_tag/ -rw------- 1 hdfs hdfs 270 2022-03-17 20:13 /user_tag/import_staging/user_tag/user_tag_p19_data_3
But I have already authorized the impalad process user in Ranger, the process user of impalad had actual read and write permissions.
[hdfs@hybrid02 ~]$ klist Ticket cache: FILE:/tmp/krb5cc_7007 Default principal: impala/hybrid02@SENSORSDATA [hdfs@hybrid02 ~]$ hdfs dfs -ls -R /user_tag/import_staging/user_tag/ -rw------- 1 hdfs hdfs 270 2022-03-17 20:13 /user_tag/import_staging/user_tag/user_tag_p19_data_3 [hdfs@hybrid02 ~]$ hdfs dfs -get /tmp/user_tag_p19_data_3_test [hdfs@hybrid02 ~]$ ll -f user_tag_p19_data_3_test user_tag_p19_data_3_test
In my opinion, because in `org.apache.impala.analysis.LoadDataStmt#analyzePaths`, the permission check on files is mainly based on `org.apache hadoop.fs.filestatus # permission`.That's why there are these exception.
Attachments
Issue Links
- duplicates
-
IMPALA-10272 LOAD DATA should respect Ranger-HDFS policies
- Resolved