Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5042

Loading metadata for partitioned tables is slow due to usage of an ArrayList, potential 4x speedup

    Details

      Description

      Loading metadata for partitions with custom paths is 4x slower compared to partitions without custom paths, the slow down is due to an N2 lookups to check if a partition already exists.

      The List should ideally be replaced with a Set.
      From https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java

        List<Path> dirsToLoad = Lists.newArrayList(tblLocation);
       if (!dirsToLoad.contains(partDir) &&
                  !FileSystemUtil.isDescendantPath(partDir, tblLocation)) {
                // This partition has a custom filesystem location. Load its file/block
                // metadata separately by adding it to the list of dirs to load.
                dirsToLoad.add(partDir);
              }
      

      From Java mission control

      Stack Trace	Sample Count	Percentage(%)
      java.lang.Thread.run()	73,611	97.157
         java.util.concurrent.ThreadPoolExecutor$Worker.run()	73,611	97.157
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)	73,611	97.157
               java.util.concurrent.FutureTask.run()	73,595	97.136
                  org.apache.impala.catalog.TableLoadingMgr$2.call()	73,555	97.083
                     org.apache.impala.catalog.TableLoadingMgr$2.call()	73,555	97.083
                        org.apache.impala.catalog.TableLoader.load(Db, String)	73,555	97.083
                           org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table)	73,555	97.083
                              org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set)	73,555	97.083
                                 org.apache.impala.catalog.HdfsTable.loadAllPartitions(List, Table)	73,508	97.021
                                    java.util.ArrayList.contains(Object)	70,094	92.515
                                       java.util.ArrayList.indexOf(Object)	70,094	92.515
                                          org.apache.hadoop.fs.Path.equals(Object)	69,462	91.681
                                             java.net.URI.equals(Object)	69,462	91.681
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bharathv bharath v
                Reporter:
                mmokhtar Mostafa Mokhtar
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: