Uploaded image for project: 'Sentry (Retired)'
  1. Sentry (Retired)
  2. SENTRY-1007

Sentry column-level performance for wide tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.5.1
    • 1.7.0
    • None
    • None

    Description

      It appears a query is taking a long time on a wide table due many Sentry column-level auth checks.

      here are some investigation results:

      1) create a table with 4000 columns, grant select on [table|db] to test user, however

      select * from table

      still validates column level privilege on each column. So this select command issues 4000 queries to validate column level permissions. It takes 40 seconds on my test cluster (2x large) to return results, for a moment, query seems to freeze:

      ...
      2016-01-11 11:54:14,816 INFO DataNucleus.Query: Reading in results for query "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test997") || (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole role" since the connection used is closing
      2016-01-11 11:54:14,822 INFO DataNucleus.Query: Reading in results for query "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test998") || (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole role" since the connection used is closing
      2016-01-11 11:54:14,828 INFO DataNucleus.Query: Reading in results for query "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test999") || (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole role" since the connection used is closing
      

      Here is the debug log from sentry service:

      org.apache.sentry.binding.hive.authz.HiveAuthzBinding.authorize(HiveAuthzBinding.java:304)] requiredInputPrivileges = {Table=[SELECT], Column=[SELECT], URI=[ALL]}
      

      2) the same issue for

      show columns in table

      3) for

      show count(*) in table

      , it requires table level privilege, so this issue doesn't exist.

      4) I found out there are more commands have the same issues are:

      SHOW COLUMNS FROM test_tb1;
      create table test_tb2 as select * from test_tb1
      show partitions in test_tb1
      select * from test_tb1
      

      Even for

      select col1,col2,col3 from test_tb1

      will issue 3 queries for each column, instead of one query for all columns in one table;

      Attachments

        1. SENTRY-1007.004.patch
          1 kB
          Dapeng Sun
        2. SENTRY-1007.003.patch
          2 kB
          Dapeng Sun
        3. SENTRY-1007.002.patch
          4 kB
          Dapeng Sun

        Issue Links

          Activity

            People

              dapengsun Dapeng Sun
              anneyu Anne Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: