Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3173 Reduce catalog's memory footprint
  3. IMPALA-3653

Consider using listLocatedStatus() API to get filestatus and blocklocations in one RPC call

Attach filesAttach ScreenshotVotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.5.0
    • Impala 2.8.0
    • Catalog

    Description

      Currently, Impala needs to make 2-3 RPC calls to load HDFS block metadata for each file.
      DistributedFileSystem.listLocatedStatus() is available long time ago, with https://issues.apache.org/jira/browse/HDFS-8887 (already backported to CDH5.5) this API can return all files' status and block locations under a directory in one RPC calls. That can greatly reduce HDFS metadata loading time. for example, for a directory with 200K files, the metadata loading time reduced from 40s to under 15s.

      One concern is memory usage. StorageID is a UUID string, diskID is int32, this info is needed for each replica. If we just simply store storageID in catalog metadata, it will increase catalog metadata size and thrift object size(Impact catalog topic update, plan fragments). We should try to do mapping at fe to make sure memory usage not increase.

      Also, Does it make sense to have a global map for host/TNetworkAddresses mapping? Currently Impala keeps one map per HdfsTable. with more tables and more nodes, this can still take quite some memory.
      If we could use a global map, we could link the storageID for each host here as well. and all impalads and catalog will have the same global mapkey for host index, and no need to send the full value in thrift update objects or plan fragments.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            bharathv Bharath Vissapragada
            jyu@cloudera.com Juan Yu
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment