Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Won't Do
    • Impala 2.2.4
    • None
    • Catalog

    Description

      In impala, users often execute invalidate metadata (without any table) to clear the metadata cache. Since by default this affects all the databases/tables on the cluster, Sentry asks users to have privileges on the whole cluster (or "server" in Sentry's terminology) to execute this. However, this brings a huge inconvenience to customers, especially when they have multiple tables to refresh. They must do this one table by one.
      From my point of view, when sentry is enabled, we should allow users to clear the metadata cache for all the tables he/she has access to, instead of asking for superuser privilege and clear all the tables metadata.

      Attachments

        Issue Links

        Activity

          +1

          tagar_impala_e3b3 Ruslan Dautkhanov added a comment - +1

          Case 108547
          and CDH-44148

          tagar_impala_e3b3 Ruslan Dautkhanov added a comment - Case 108547 and CDH-44148
          grahn Greg Rahn added a comment -

          There are a several challenges with this request so let me try to speak to each of them individually to help convey why this is a difficult ask.

          The command invalidate metadata is very expensive as this evicts all cached metadata and does an entire catalog reload. For large catalogs, this can take a significant amount of time. It is for this exact reason that the command is limited to the Sentry admin role but was also extended to allow a table name to be specified such that new tables could be added by non-admins with the proper table permissions without evicting existing valid metadata. This command needs only be run when tables are created outside of impala to make impala "see" the new table. If impala is aware of the table (visible in show tables) and one has just added/modified data/files to it, the refresh <table> command will suffice.
          See the docs for more details on both commands:

          Another challenge is that there currently is no API with Sentry to return the list of tables that a user can perform invalidate on. Each table needs to be iterated over currently to determine if the invalidate action is permitted.

          Additionally, for databases that contain large numbers of tables that a user has invalidate privileges on, the asked for approach can also have severe performance impacts. For example, if I add 10 new tables in a database that has 5000 tables that I have invalidate privileges to, not only does the security permission need to be checked 5000 times, invalidate has to be called 5000 times. Both of these operations result in unnecessary overhead.

          I understand the desire for automation, however, this can be dealt with by scripting the invalidate metadata <table> commands for the set of new tables being added externally. This approach scopes the invalidate command to only the newly added tables and does not unnecessarily invalidate tables that need not be.

          Hope that helps clarify the challenges with this ask.

          grahn Greg Rahn added a comment - There are a several challenges with this request so let me try to speak to each of them individually to help convey why this is a difficult ask. The command invalidate metadata is very expensive as this evicts all cached metadata and does an entire catalog reload. For large catalogs, this can take a significant amount of time. It is for this exact reason that the command is limited to the Sentry admin role but was also extended to allow a table name to be specified such that new tables could be added by non-admins with the proper table permissions without evicting existing valid metadata. This command needs only be run when tables are created outside of impala to make impala "see" the new table. If impala is aware of the table (visible in show tables ) and one has just added/modified data/files to it, the refresh <table> command will suffice. See the docs for more details on both commands: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_invalidate_metadata.html#invalidate_metadata http://www.cloudera.com/documentation/enterprise/latest/topics/impala_refresh.html#refresh Another challenge is that there currently is no API with Sentry to return the list of tables that a user can perform invalidate on. Each table needs to be iterated over currently to determine if the invalidate action is permitted. Additionally, for databases that contain large numbers of tables that a user has invalidate privileges on, the asked for approach can also have severe performance impacts. For example, if I add 10 new tables in a database that has 5000 tables that I have invalidate privileges to, not only does the security permission need to be checked 5000 times, invalidate has to be called 5000 times. Both of these operations result in unnecessary overhead. I understand the desire for automation, however, this can be dealt with by scripting the invalidate metadata <table> commands for the set of new tables being added externally. This approach scopes the invalidate command to only the newly added tables and does not unnecessarily invalidate tables that need not be. Hope that helps clarify the challenges with this ask.

          People

            Unassigned Unassigned
            Yibing Yibing Shi
            Votes:
            1 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment