Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
ghx-label-2
Description
Saw callstacks where most of EventProcessor's time is spent in rechecking access level for partition directories
org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder org.apache.impala.catalog.HdfsTable.createPartitionBuilder org.apache.impala.catalog.HdfsTable.reloadPartitions org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access control list bit is set in the status, a getAclStatus() call to the namenode.
It is questionable whether we should recheck this during refreshing tables for directories that were already checked in the past, as it can be expensive and is unlikely to change. AFAIK having stale data shouldn't cause security issues, as if Impala has no right to access/modify the file, the name node will simply not allow this operation (coordinators/executors use the same username as catalogd for HDFS ops).
Note that the whole access level check is skipped for most other filesystems than HDFS (see HdfsTable.assumeReadWriteAccess()).
Currently catalogd checks this for each partition level event (even if they are batched). While checking it once during CREATE PARTITON makes sense, rechecking it for every INSERT and ALTER seems like an overkill - especially an INSERT shouldn't reduce access rights on a partition table.
Besides event processor, rechecking during REFRESH and reloads after DML/DDLs are also questionable. If there was an actual change, INVALIDATE METADATA can be used to reload the table from scratch.
Attachments
Issue Links
- causes
-
IMPALA-12456 Apply optimization in IMPALA-7320 during event processing
- Open
- is related to
-
IMPALA-7321 Write permissions checks for insert into multi-level partitioned table are incorrect
- Open
-
IMPALA-12476 Single thread permission checks can bottleneck table loading
- Open
- relates to
-
IMPALA-7539 Support HDFS permissions checks with LocalCatalog
- Open