Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4172

Switch from using getFileBlockLocations to BlockLocation methods (Potential 50% speedup in metadata loading)

    XMLWordPrintableJSON

Details

    Description

      HDFS-8895 removes the ability to query volume IDs from datanodes. This information has instead been added to BlockLocation, which is accessible via various FileSystem APIs (namely, anything that returns LocatedFileStatus).
      This new API is more efficient and more accurate. It's also available from CDH5.5 onwards, so can be backported as well.

      getFileBlockLocations is a bottle neck during metadata loading for Impala.

      Stack Trace	Sample Count	Percentage(%)
      java.lang.Thread.run()	17,837	73.758
         java.util.concurrent.ThreadPoolExecutor$Worker.run()	17,837	73.758
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)	17,837	73.758
               java.util.concurrent.FutureTask.run()	17,600	72.778
                  com.cloudera.impala.catalog.TableLoadingMgr$2.call()	17,513	72.419
                     com.cloudera.impala.catalog.TableLoadingMgr$2.call()	17,513	72.419
                        com.cloudera.impala.catalog.TableLoader.load(Db, String)	17,513	72.419
                           com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table)	17,513	72.419
                              com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set)	17,513	72.419
                                 com.cloudera.impala.catalog.HdfsTable.loadAllPartitions(List, Table)	15,721	65.008
                                    com.cloudera.impala.catalog.HdfsTable.createPartition(StorageDescriptor, Partition, Map)	13,611	56.283
                                       com.cloudera.impala.catalog.HdfsTable.updatePartitionFds(Path, boolean, HdfsFileFormat, Map)	7,942	32.841
                                          com.cloudera.impala.catalog.HdfsTable.loadBlockMetadata(FileSystem, FileStatus, HdfsPartition$FileDescriptor, HdfsFileFormat, Map)	4,319	17.86
                                             org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(FileStatus, long, long)	3,678	15.209
                                             com.cloudera.impala.catalog.HdfsPartition$BlockReplica.parseLocation(String)	203	0.839
      
      

      Pointer to the JAVA docs for the new API

      https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html#listFiles(org.apache.hadoop.fs.Path, boolean)

      Attachments

        Issue Links

          Activity

            People

              bharathv Bharath Vissapragada
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: