Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5429

Use a thread pool to load block metadata in parallel

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.11.0
    • Component/s: Catalog
    • Labels:

      Description

      Metadata loading for tables with lots of partitions can be fairly slow special on S3 and ADLS, the operation is fairly latency driven so multiple threads should help speedup the process.

      Listing files from multiple partitions in parallel should provide well speedup specially for S3 and ADLS where latencies are usually higher than HDFS.

      HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) might be a good starting point.

      Stack-Trace Count Percentage(%) Total
      com.amazonaws.services.s3.AmazonS3Client.listObjects(ListObjectsRequest) 4,340 75.649 83,489,694,712
      ---org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(ListObjectsRequest) 4,340 75.649 83,489,694,712
      ------org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(Path,-String,-Set) 3,256 56.754 63,540,096,016
      ---------org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(Path,-boolean) 3,256 56.754 63,540,096,016
      ------------org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(Path) 3,256 56.754 63,540,096,016
      ---------------org.apache.hadoop.fs.FileSystem.exists(Path) 2,178 37.964 45,375,122,798
      ------------------org.apache.hadoop.fs.s3a.S3AFileSystem.exists(Path) 2,178 37.964 45,375,122,798
      ---------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) 1,082 18.86 23,383,160,065
      ------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) 1,082 18.86 23,383,160,065
      ---------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean) 1,082 18.86 23,383,160,065
      ------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table,-boolean,-boolean,-Set) 1,082 18.86 23,383,160,065
      ---------------------------------org.apache.impala.catalog.HdfsTable.load(boolean,-IMetaStoreClient,-Table) 1,082 18.86 23,383,160,065
      ---------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition) 1,096 19.104 21,991,962,733
      ------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) 1,096 19.104 21,991,962,733
      ---------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) 1,096 19.104 21,991,962,733
      ------------------------------org.apache.impala.catalog.HdfsTable.updatePartitionsFromHms(IMetaStoreClient,-Set,-boolean) 1,096 19.104 21,991,962,733
      --------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor) 1,078 18.79 18,164,973,218
      ------------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean) 1,078 18.79 18,164,973,218
      ---------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap) 1,078 18.79 18,164,973,218
      ------------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition) 1,078 18.79 18,164,973,218
      ---------------------------org.apache.impala.catalog.HdfsTable.refreshFileMetadata(HdfsPartition) 1,078 18.79 18,164,973,218
      ------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(StorageDescriptor,-HdfsPartition) 1,078 18.79 18,164,973,218
      ---------------------------------org.apache.impala.catalog.HdfsTable.loadPartitionFileMetadata(List) 1,078 18.79 18,164,973,218
      ------org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.<init>(Listing,-Path,-ListObjectsRequest) 1,084 18.895 19,949,598,696
      ---------org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Path,-ListObjectsRequest,-PathFilter,-Listing$FileStatusAcceptor,-RemoteIterator) 1,084 18.895 19,949,598,696
      ------------org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(Path,-boolean,-Listing$FileStatusAcceptor) 1,084 18.895 19,949,598,696
      ---------------org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(Path,-boolean) 1,084 18.895 19,949,598,696
      ------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-Path,-HashMap) 1,084 18.895 19,949,598,696
      --------------------org.apache.impala.catalog.HdfsTable.synthesizeBlockMetadata(FileSystem,-HdfsPartition) 1,084 18.895 19,949,598,696

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bharathv bharath v
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: