Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9195

Using multithreaded execution to accelerate ‘show tables/databases’

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • Impala 3.4.0
    • Frontend
    • None
    • ghx-label-11

    Description

      Impala version: 2.12

      Using sentry for authentication

      While users with multi group-policies(group-policy may be nested) executing 'show tables/databases',it seems to be awful with a long latency. In my case, the database has 910 tables, the user waiting 65.886 seconds to get 160 tables.  

      I study the code and find that while executing Frontend.getTableNames:

      for table in tables:

          for action in actions(all actions defined in DBModelAction):

             ResourceAuthorizationProvider.hasAccess

      It seems that 'hasAccess' is responsable for bad performance while checking users with complex group-policies. 

      I tried to use 16 threads in getTablesNames and it costs 4.752 seconds in my case.  

      The code seems to be the same while using sentry service in the latest impala. I'm not sure that if any promotion has been done in the latest sentry service as I failed to migrate file-based sentry authentication to the sentry service. I see that ranger is supported in the latest impala, does ranger have the similar problem? 

      It seems 'show tables/databases' can benefit from multithreaded execution while using sentry , is it reasonable to support such operations in query option MT_DOP?

          

              

      Attachments

        Activity

          People

            xuzhou xuzhou
            xuzhou xuzhou
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: