Uploaded image for project: 'Sentry (Retired)'
  1. Sentry (Retired)
  2. SENTRY-2539

PolicyEngine should be able to return privilege directly



    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1
    • 2.2.0
    • Sentry
    • None



      Right now, for a command such "show databases", Sentry has to perform authorization checks on each database. When there are many databases, like 12000 databases in the system, the authorization checks of a single command in Sentry could be very slow. There are two main factors that slow down authorization checks in Sentry even when caching is enabled:

      1) Cache returns the list of privileges in the form of String. As a result, every authorization check has to convert the privilege string to privilege object.

      2) When cache is enabled, the cache returns all privileges of a given user regardless what resource to check.

        2.1) for example, a user has 2000 privileges assigned and the resource to check is "server=server1, database=db_1, table=table_1". The cache returns all 2000 privileges including unrelated privileges such like "server=server1->database=db_2->action=ALL". 

        2.2) Returning unrelated privileges has two side effects:

          2.2.1) Converting privileges from String to Object overhead is proportional to the number of returned privileges from cache. Converting unrelated privileges cost time, but no benefit.

          2.2.2) Authorization check goes through each privilege, and its overhead is proportional to the number of returned privileges from cache. Converting unrelated privileges cost time, but no benefit.


      1) Add a new function listPrivilegeObjects that lets authorization provider get privilege objects when checking the authorization. This avoids the conversion overhead. All the interfaces from policy engine (PolicyEngine) to the cache (PrivilegeCache) have to be changed to add this new function. 

      2) Implement a new cache TreePrivilegeCache. It converts the privilege from String format to Privilege object at beginning, and directly return the privilege objects in listPrivilegeObjects at authorization check. This avoids the overhead of conversion at each authorization check. 

      3) TreePrivilegeCache organizes the privileges based on the resource hierarchy, like a tree. Therefore, it can return only related privileges based on the resource to check. This reduces the authorization check overhead. 

        3.1) For example, a user has 2000 privileges assigned, and the resource to check is "server=server1, database=db_1, table=table_1". the cache TreePrivilegeCache returns only related privileges excluding unrelated privileges such like "server=server1->database=db_2->action=ALL". 

        3.2) SENTRY-1291 was to address the problem 2). However, it did not address the problem 1). And its implementation SimplePrivilegeCache is not memory efficient (the key of the map contains the whole resource hierarchy, and many keys share large portion of the same content), nor operational efficient (for each authorization check, SimplePrivilegeCache .listPrivileges() has to construct a large amount of keys in order to find all related privileges in a map). 

      4) Use TreePrivilegeCache instead of SimplePrivilegeCache for caching. Note, this solution is built on top of SENTRY-1291, and utilizes the changes SENTRY-1291 made, such as providing resource hierarchy when getting privileges for authorization check.


      Major Behavior Change

      1) Create a new Interface FilteredPrivilegeCache, which extends from PrivilegeCache.

      2) Move the function added by SENTRY-1291 in PrivilegeCache to FilteredPrivilegeCache. Add additional functions in this solution to FilteredPrivilegeCache. In this way, there is no change in PrivilegeCache, and we are backward compatible with old implementation before SENTRY-1291.

      3) Move all changed in SimplePrivilegeCache (implements PrivilegeCache) from  SENTRY-1291 to a new class SimpleFilteredPrivilegeCache, which implements  FilteredPrivilegeCache. 

      4) Instead of hard-coding the privilege cache class, use configuration AuthzConfVars.AUTHZ_PRIVILEGE_CACHE ("sentry.hive.privilege.cache") to specify the privilege cache class name. The default value is "org.apache.sentry.provider.cache.TreePrivilegeCache". User can change to another cache implementation in sentry-site.xml at a service (such as hive server or HMS). The options are

        4.1) org.apache.sentry.provider.cache.SimplePrivilegeCache (the original cache implementation before SENTRY-1291)

        4.2) org.apache.sentry.provider.cache.SimpleFilteredPrivilegeCache (the cache implemented in SENTRY-1291)

        4.3) org.apache.sentry.provider.cache.TreePrivilegeCache (the cache implemented in this Jira SENTRY-2539)




        1. SENTRY-2539.002.patch
          61 kB
          Na Li
        2. SENTRY-2539.003.patch
          71 kB
          Na Li
        3. SENTRY-2539.005.patch
          69 kB
          Na Li
        4. SENTRY-2539.006.patch
          80 kB
          Na Li
        5. SENTRY-2539.007.patch
          83 kB
          Na Li
        6. SENTRY-2539.008.patch
          82 kB
          Na Li
        7. SENTRY-2539.008.patch
          82 kB
          Na Li
        8. SENTRY-2539.008.patch
          82 kB
          Na Li
        9. SENTRY-2539.009.patch
          115 kB
          Na Li
        10. SENTRY-2539.010.patch
          115 kB
          Na Li
        11. SENTRY-2539.010.patch
          115 kB
          Na Li
        12. SENTRY-2539.013.patch
          115 kB
          Na Li
        13. SENTRY-2539.013.patch
          115 kB
          Na Li
        14. SENTRY-2539.013.patch
          115 kB
          Na Li
        15. SENTRY-2539.013.patch
          115 kB
          Na Li
        16. SENTRY-2539.014.patch
          115 kB
          Na Li

        Issue Links



              linaataustin Na Li
              linaataustin Na Li
              0 Vote for this issue
              3 Start watching this issue