Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4827

Checking modification time of directories takes too long, needs to be improved

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.8.0
    • None
    • Functions - Drill
    • None
    • RHEL 6

    Description

      This is tracking bug for metadata cache performance for directory checking.

      When evaluating the fix for Drill-4530, we run the following two queries on 50K parquet files in a 3-layer directory hierarchy:
      Query1: explain plan for select * from dfs.`/tpchMetaParquet/tpch100_dir_partitioned_50000files/lineitem` where dir0=2006 and dir1=12 and dir2=15;

      Query2: explain plan for select * from dfs.`/tpchMetaParquet/tpch100_dir_partitioned_50000files/lineitem/2006/12/15`;

      Query1 takes 3.254 secs. Query2 0.505 secs.

      Drillbit.log shows that for Query1, 2.5 secs spent after metadata cache was read and before partition pruning:

      2016-08-02 15:43:43,051 ucs-node7.perf.lab [285edddf-b1f3-cd74-e826-84cb91ebc6e1:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 285edddf-b1f3-cd74-e826-84cb91ebc6e1: explain plan for select * from dfs.`/tpchMetaParquet/tpch100_dir_partitioned_50000files/lineitem` where dir0=2006 and dir1=12 and dir2=15
      2016-08-02 15:43:43,193 ucs-node7.perf.lab [285edddf-b1f3-cd74-e826-84cb91ebc6e1:foreman] INFO o.a.d.exec.store.parquet.Metadata - Took 6 ms to read directories from directory cache file
      2016-08-02 15:43:45,745 ucs-node7.perf.lab [285edddf-b1f3-cd74-e826-84cb91ebc6e1:foreman] INFO o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning class: org.apache.drill.exec.planner.logical.partition.PruneScanRule$DirPruneScanFilterOnScanRule

      Further investigation shows that the 2.5 secs was for checking modification time of directories, which is proportional to the number of directories to be checked.
      Looks like this can be improved by only checking the top level directory.

      Attachments

        Activity

          People

            vdonapati Venkata Jyothsna Donapati
            dechanggu Dechang Gu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: