Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-230

Build a sampling range partitioner

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.4.0
    • None
    • None

    Description

      Create a new operator that caches a number of record batches and then coordinates across the cluster on the distribution of partitioning keys to try to determine a reasonable set of range partitions. The outgoing stream should include a partition key that is equal to the width of the receiving fragment.

      • histogram or similar should be held in the distributed cache
      • need to figure out the logic for how long to wait before the partitioning estimate is good enough.
      • need to update the partitioning sender so that we can drop the partitioning column rather than sending it onward.

      Attachments

        1. DRILL-230.patch
          120 kB
          Steven Phillips
        2. DRILL-230_2013-10-23_13:10:50.patch
          206 kB
          Steven Phillips
        3. DRILL-230_2013-10-25_05:15:16.patch
          226 kB
          Steven Phillips
        4. DRILL-230_2013-10-25_05:16:58.patch
          218 kB
          Steven Phillips

        Activity

          People

            sphillips Steven Phillips
            jnadeau Jacques Nadeau
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: