Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12472

Skip permission check when refreshing in event processor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Catalog
    • None

    Description

      Saw callstacks where most of EventProcessor's time is spent in rechecking access level for partition directories

      org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel
      org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder
      org.apache.impala.catalog.HdfsTable.createPartitionBuilder
      org.apache.impala.catalog.HdfsTable.reloadPartitions
      org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames
      org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions
      org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
      

      HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access control list bit is set in the status, a getAclStatus() call to the namenode.

      It is questionable whether we should recheck this during refreshing tables for directories that were already checked in the past, as it can be expensive and is unlikely to change. AFAIK having stale data shouldn't cause security issues, as if Impala has no right to access/modify the file, the name node will simply not allow this operation (coordinators/executors use the same username as catalogd for HDFS ops).
      Note that the whole access level check is skipped for most other filesystems than HDFS (see HdfsTable.assumeReadWriteAccess()).

      Currently catalogd checks this for each partition level event (even if they are batched). While checking it once during CREATE PARTITON makes sense, rechecking it for every INSERT and ALTER seems like an overkill - especially an INSERT shouldn't reduce access rights on a partition table.

      Besides event processor, rechecking during REFRESH and reloads after DML/DDLs are also questionable. If there was an actual change, INVALIDATE METADATA can be used to reload the table from scratch.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: