Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4172

Switch from using getFileBlockLocations to BlockLocation methods (Potential 50% speedup in metadata loading)

    Details

      Description

      HDFS-8895 removes the ability to query volume IDs from datanodes. This information has instead been added to BlockLocation, which is accessible via various FileSystem APIs (namely, anything that returns LocatedFileStatus).
      This new API is more efficient and more accurate. It's also available from CDH5.5 onwards, so can be backported as well.

      getFileBlockLocations is a bottle neck during metadata loading for Impala.

      Stack Trace	Sample Count	Percentage(%)
      java.lang.Thread.run()	17,837	73.758
         java.util.concurrent.ThreadPoolExecutor$Worker.run()	17,837	73.758
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)	17,837	73.758
               java.util.concurrent.FutureTask.run()	17,600	72.778
                  com.cloudera.impala.catalog.TableLoadingMgr$2.call()	17,513	72.419
                     com.cloudera.impala.catalog.TableLoadingMgr$2.call()	17,513	72.419
                        com.cloudera.impala.catalog.TableLoader.load(Db, String)	17,513	72.419
                           com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table)	17,513	72.419
                              com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set)	17,513	72.419
                                 com.cloudera.impala.catalog.HdfsTable.loadAllPartitions(List, Table)	15,721	65.008
                                    com.cloudera.impala.catalog.HdfsTable.createPartition(StorageDescriptor, Partition, Map)	13,611	56.283
                                       com.cloudera.impala.catalog.HdfsTable.updatePartitionFds(Path, boolean, HdfsFileFormat, Map)	7,942	32.841
                                          com.cloudera.impala.catalog.HdfsTable.loadBlockMetadata(FileSystem, FileStatus, HdfsPartition$FileDescriptor, HdfsFileFormat, Map)	4,319	17.86
                                             org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(FileStatus, long, long)	3,678	15.209
                                             com.cloudera.impala.catalog.HdfsPartition$BlockReplica.parseLocation(String)	203	0.839
      
      

      Pointer to the JAVA docs for the new API

      https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileSystem.html#listFiles(org.apache.hadoop.fs.Path, boolean)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bharathv bharath v
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: