Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.8.0
Description
HDFS-8895 removes the ability to query volume IDs from datanodes. This information has instead been added to BlockLocation, which is accessible via various FileSystem APIs (namely, anything that returns LocatedFileStatus).
This new API is more efficient and more accurate. It's also available from CDH5.5 onwards, so can be backported as well.
getFileBlockLocations is a bottle neck during metadata loading for Impala.
Stack Trace Sample Count Percentage(%) java.lang.Thread.run() 17,837 73.758 java.util.concurrent.ThreadPoolExecutor$Worker.run() 17,837 73.758 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) 17,837 73.758 java.util.concurrent.FutureTask.run() 17,600 72.778 com.cloudera.impala.catalog.TableLoadingMgr$2.call() 17,513 72.419 com.cloudera.impala.catalog.TableLoadingMgr$2.call() 17,513 72.419 com.cloudera.impala.catalog.TableLoader.load(Db, String) 17,513 72.419 com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table) 17,513 72.419 com.cloudera.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set) 17,513 72.419 com.cloudera.impala.catalog.HdfsTable.loadAllPartitions(List, Table) 15,721 65.008 com.cloudera.impala.catalog.HdfsTable.createPartition(StorageDescriptor, Partition, Map) 13,611 56.283 com.cloudera.impala.catalog.HdfsTable.updatePartitionFds(Path, boolean, HdfsFileFormat, Map) 7,942 32.841 com.cloudera.impala.catalog.HdfsTable.loadBlockMetadata(FileSystem, FileStatus, HdfsPartition$FileDescriptor, HdfsFileFormat, Map) 4,319 17.86 org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(FileStatus, long, long) 3,678 15.209 com.cloudera.impala.catalog.HdfsPartition$BlockReplica.parseLocation(String) 203 0.839
Pointer to the JAVA docs for the new API
Attachments
Attachments
Issue Links
- blocks
-
IMPALA-4277 Impala should build against latest Hadoop components
- Resolved
- breaks
-
IMPALA-4789 Slow metadata loading with many partitions that have inconsistent HDFS path qualification
- Resolved
- is related to
-
IMPALA-3482 S3: Consider bulk listing of files in the catalog vs individually accessing them
- Resolved
- relates to
-
IMPALA-3653 Consider using listLocatedStatus() API to get filestatus and blocklocations in one RPC call
- Resolved
-
IMPALA-4840 Fix REFRESH perf issues.
- Resolved