Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5042

Loading metadata for partitioned tables is slow due to usage of an ArrayList, potential 4x speedup

    XMLWordPrintableJSON

Details

    Description

      Loading metadata for partitions with custom paths is 4x slower compared to partitions without custom paths, the slow down is due to an N2 lookups to check if a partition already exists.

      The List should ideally be replaced with a Set.
      From https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java

        List<Path> dirsToLoad = Lists.newArrayList(tblLocation);
       if (!dirsToLoad.contains(partDir) &&
                  !FileSystemUtil.isDescendantPath(partDir, tblLocation)) {
                // This partition has a custom filesystem location. Load its file/block
                // metadata separately by adding it to the list of dirs to load.
                dirsToLoad.add(partDir);
              }
      

      From Java mission control

      Stack Trace	Sample Count	Percentage(%)
      java.lang.Thread.run()	73,611	97.157
         java.util.concurrent.ThreadPoolExecutor$Worker.run()	73,611	97.157
            java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker)	73,611	97.157
               java.util.concurrent.FutureTask.run()	73,595	97.136
                  org.apache.impala.catalog.TableLoadingMgr$2.call()	73,555	97.083
                     org.apache.impala.catalog.TableLoadingMgr$2.call()	73,555	97.083
                        org.apache.impala.catalog.TableLoader.load(Db, String)	73,555	97.083
                           org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table)	73,555	97.083
                              org.apache.impala.catalog.HdfsTable.load(boolean, IMetaStoreClient, Table, boolean, boolean, Set)	73,555	97.083
                                 org.apache.impala.catalog.HdfsTable.loadAllPartitions(List, Table)	73,508	97.021
                                    java.util.ArrayList.contains(Object)	70,094	92.515
                                       java.util.ArrayList.indexOf(Object)	70,094	92.515
                                          org.apache.hadoop.fs.Path.equals(Object)	69,462	91.681
                                             java.net.URI.equals(Object)	69,462	91.681
      

      Attachments

        Issue Links

          Activity

            People

              bharathv Bharath Vissapragada
              mmokhtar Mostafa Mokhtar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: