Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
ghx-label-3
Description
thundergun reported an issue that analyzing a LOAD DATA statement fails in checking access to the source file while a Ranger HDFS policy actually exists to allow the access. Impala only loads the permissions from HDFS and check accesses by itself. Related codes: https://github.com/apache/impala/blob/ee4043e1a0940ae5711c68336d1ad522631d0e35/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java#L195-L206
When Ranger authorization is enabled, this could be wrong if the HDFS permissions is more restrict than the Ranger policies. According to the Ranger document: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=57901344#RangerUserGuide(workinprogress)-HDFSPolicycreation
when the NameNode receives a user request, the Ranger Plugin checks for policies set through the Ranger Policy Manager. Then, if there are no policies authorizing the request, the Ranger plugin checks for permissions set in HDFS.
We currently don't have an embeded ranger-hdfs plugin to check this locally. For a quick fix, I think when Ranger authz is enabled, we can check the access using FileSystem#access(Path path, FsAction mode) to invoke a NameNode RPC to respect Ranger-HDFS policies.
Attachments
Issue Links
- is duplicated by
-
IMPALA-11194 Unable to LOAD DATA from HDFS configured with ranger
- Closed
- is related to
-
IMPALA-12291 Insert statement fails even if hdfs ranger policy allows it
- Resolved
- relates to
-
IMPALA-11871 INSERT statement does not respect Ranger policies for HDFS
- Resolved
- links to