Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3381

Add option to distribute partition keys in CTAS

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None
    • None

    Description

      The current implementation does not redistribute, which would tend to result in a lot of extra files. Specifically, the number of files will be larger by a factor equal to the number of fragments in the final stage of the query. On even a moderately sized cluster, this number could easily be in the thousands, so a table with a 100 different partitions would end up with hundreds of thousands of files.

      To allow a workaround for this situation, we should add an option to include an extra distribution, so that all the rows for any given partition are written from the same writer.

      Attachments

        1. DRILL-3381.patch
          8 kB
          Steven Phillips
        2. DRILL-3381.patch
          8 kB
          Steven Phillips

        Activity

          People

            sphillips Steven Phillips
            sphillips Steven Phillips
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: