Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10192

IllegalStateException in processing column masking audit events

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • Impala 4.0.0
    • None
    • None
    • ghx-label-2

    Description

      Users reported an IllegalStateException about column masking. I can reproduce it in the master branch:

      I0925 21:42:09.684499 20809 jni-util.cc:288] ed44b3c5ca4a0e7d:8c4e884400000000] java.lang.IllegalStateException
              at com.google.common.base.Preconditions.checkState(Preconditions.java:492)
              at org.apache.impala.authorization.ranger.RangerAuthorizationContext.stashAuditEvents(RangerAuthorizationContext.java:71)
              at org.apache.impala.authorization.ranger.RangerAuthorizationChecker.postAnalyze(RangerAuthorizationChecker.java:373)
              at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:440)
              at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1562)
              at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1529)
              at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1499)
              at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:162)
      

      It happens when there are several column masking policies on a table and not all of them are applied on the current user, i.e. some masking policies exist and apply on other users. Then if the current user query the table, the error occurs.

      Reproducing
      Start Impala cluster with Ranger authz enabled

      bin/start-impala-cluster.py --impalad_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger" --catalogd_args="--server-name=server1 --ranger_service_type=hive --ranger_app_id=impala --authorization_provider=ranger"
      

      Create a tmp table using your username.

      $ bin/impala-shell.sh
      [localhost:21050] default> create table tmp_tbl (id int, name string) stored as parquet;
      

      Open the Ranger WebUI at http://localhost:6080/. Add two column masking policies:

      • Masking default.tmp_tbl.id using HASH for user "non_owner"
      • Masking default.tmp_tbl.name using REDACT for your username (quanlong in my case)

      Refresh the policies in impala and query the table using your username.

      bin/impala-shell.sh -u admin -q "refresh authorization"
      bin/impala-shell.sh -q "select * from tmp_tbl"
      

      The last query will fail with "ERROR: IllegalStateException: null".

      The policy file is attached.

      Clues

      In RangerAuthorizationContext.stashAuditEvents(), we deduplicate the column masking audit events. There is a Precondition check that all events generated are column masking events:
      https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java#L71
      Codes:

        public void stashAuditEvents(RangerImpalaPlugin plugin) {
          Set<String> unfilteredMaskNames = plugin.getUnfilteredMaskNames(
              Arrays.asList("MASK_NONE"));
          for (AuthzAuditEvent event : auditHandler_.getAuthzEvents()) {
            // We assume that all the logged events until now are column masking-related. Since
            // we remove those AuthzAuditEvent's corresponding to the "Unmasked" policy of type
            // "MASK_NONE", we exclude this type of mask.
            Preconditions.checkState(unfilteredMaskNames
                .contains(event.getAccessType().toUpperCase()));
      
            // event.getEventKey() is the concatenation of the following fields in an
            // AuthzAuditEvent: 'user', 'accessType', 'resourcePath', 'resourceType', 'action',
            // 'accessResult', 'sessionId', and 'clientIP'. Recall that 'resourcePath' is the
            // concatenation of 'dbName', 'tableName', and 'columnName' that were used to
            // instantiate a RangerAccessResourceImpl in order to create a RangerAccessRequest
            // to call RangerImpalaPlugin#evalDataMaskPolicies(). Refer to
            // RangerAuthorizationChecker#evalColumnMask() for further details.
            deduplicatedAuditEvents_.put(event.getEventKey(), event);
          }
          auditHandler_.getAuthzEvents().clear();
        }
      

      However, it's possible that some SELECT events are generated during the analyzing phase at here:
      https://github.com/apache/impala/blob/5c69e7ba583297dc886652ac5952816882b928af/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L308
      Looks like if there is a column masking policy on a column and the policy doesn't target to the current user, Ranger plugin will generate a SELECT audit event. In this case, the first masking policy is on "id" column for user "non_owner". Then we get a SELECT event on this column. The second masking policy is on "name" column for the current user. We get a mask event as we expected.

      We should deal with these non mask events correctly. On the other hand, we should replace all Precondition checks on the audit code paths with error loggings, since these should not fail a query.

      cc fangyurao

      Attachments

        1. Ranger_Policies_IMPALA-10192.json
          24 kB
          Quanlong Huang

        Issue Links

          Activity

            People

              fangyurao Fang-Yu Rao
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: