Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-2260

Add support for partitioning files by certain criteria when doing a CTAS

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • Future
    • Execution - Flow
    • None

    Description

      Doing a CTAS where we create a large number of files (thousands) is becoming increasingly common. In order to do partition pruning, we need to organize the files into subdirectories such that Drill can expose the directory names as 'dir0', 'dir1' etc. and perform pruning. Currently, the organization of these files into subdirectories is a manual process and can be tedious.

      We need to provide a mechanism to organize these output files into subdirectories without manual intervention. We could add a PARTITIONED BY <column> extension to the CTAS statement, similar to what Hive does.

      One question is: suppose we partition by the Month column, do we remove that column from the output files ? (since the column is represented by the subdirectories).

      Since this is a 'feature' that would span multiple components, I haven't categorized it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            amansinha100 Aman Sinha
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: