Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10117

Skip calls to FsPermissionCache for blob stores

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.0.0
    • Frontend
    • ghx-label-6

    Description

      The FsPermissionCache is described as:

      /**
       * Simple non-thread-safe cache for resolved file permissions. This allows
       * pre-caching permissions by listing the status of all files within a directory,
       * and then using that cache to avoid round trips to the FileSystem for later
       * queries of those paths.
       */ 

      I confirmed, and FsPermissionCache#precacheChildrenOf is actually called for data stored on S3. The issue is that FsPermissionCache#getPermissions is called inside HdfsTable#getAvailableAccessLevel, which is skipped for S3. So all the cached metadata is not used. The problem is that precacheChildrenOf calls getFileStatus for all files, which results in a bunch of unnecessary metadata operations to S3 + a bunch of cached metadata that is never used.

      precacheChildrenOf is actually only invoked in the specific scenario described below:

          // Only preload permissions if the number of partitions to be added is
          // large (3x) relative to the number of existing partitions. This covers
          // two common cases:
          //
          // 1) initial load of a table (no existing partition metadata)
          // 2) ALTER TABLE RECOVER PARTITIONS after creating a table pointing to
          // an already-existing partition directory tree
          //
          // Without this heuristic, we would end up using a "listStatus" call to
          // potentially fetch a bunch of irrelevant information about existing
          // partitions when we only want to know about a small number of newly-added
          // partitions.
      

      Regardless, skipping the call to precacheChildrenOf for blob stores should (1) improve table loading time for S3 backed tables, and (2) decrease catalogd memory requirements when loading a bunch of tables stored on S3.

      Attachments

        Activity

          People

            tarmstrong Tim Armstrong
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: