Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2173

Enable querying partition information without reading all data

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.7.0
    • Fix Version/s: 0.9.0
    • Labels:
      None

      Description

      When reading a series of files in nested directories, Drill currently adds columns representing the directory structure that was traversed to reach the file currently being read. These columns are stored as varchar under tha names dir0, dir1, ... As these are just regular columns, Drill allows arbitrary queries against this data, in terms of aggregates, filter, sort, etc. To allow optimizing reads, basic partition pruning has already been added to prune in the case of an expression like dir0 = "2015" or a simple in list, which is converted during planning to a series of ORs of equals expressions. If users want to query the directory information dynamically, and not include specific directory names in the query, this will prompt a full table scan and filter operation on the dir columns. This enhancement is to allow more complex queries to be run against directory metadata, and only scanning the matching directories.

        Attachments

        1. Drill-2173-maxdir-with-pruning-feb-27.patch
          34 kB
          Jason Altekruse
        2. Drill-2173-maxdir-with-pruning-feb-6.patch
          37 kB
          Jason Altekruse
        3. Drill-2173-partition-queries-with-pruning.patch
          36 kB
          Jason Altekruse

          Issue Links

            Activity

              People

              • Assignee:
                jaltekruse Jason Altekruse
                Reporter:
                jaltekruse Jason Altekruse
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: