Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3759

Make partition pruning multi-phased to reduce the working set kept in memory

    XMLWordPrintableJSON

Details

    Description

      Currently, partition pruning gets all file names in the table and applies the pruning. Suppose the files are spread out over several directories and there is a filter on dirN, this is not efficient - both in terms of elapsed time and memory usage. This has been seen in a few use cases recently.

      Wherever possible, we should ideally perform the pruning in N steps (where N is the number of directory levels referenced in the filter conditions):
      1. Get the directory and filenames at level i
      2. Materialize into the in-memory table
      3. Apply interpreter-based evaluation of filter condition
      4. Determine qualifying directories, increment i and repeat from step 1

      This multi phase approach may not be possible for certain types of filters - e,g for disjunctions. This analysis needs to be done.

      Attachments

        Activity

          People

            mehant Mehant Baid
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: