• Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.4.0
    • Component/s: None
    • Labels:


      This operator will allow sorting data that is larger than the size of memory by spilling to disk when necessary.

      Also as part of this jira, I will implement a new merge sort algorithm that will hopefully better utilize cluster resources than our current sort, which is based on Quicksort. The problem with quicksort is that we can't do any sorting until all of the batches have arrived. But this will often result in very low CPU utilization while the data is read from disk, followed by a period of very high CPU usage.

      The external sort will include sorting each batch individually when it arrives. In the case where no spills occur, it makes sense to take advantage of the fact that the batches are already sorted, but doing the an N-way merge, as is done when there are spills, is not the most effective way to do this, (according to some tests I have done). What will be done in this case, rather than an N-way merge using a heap, we will do a variation of natural merge sort, which amounts to essentially a staged, 2-way merge of the incoming (sorted) batches.

      1. DRILL-386.patch
        105 kB
        Steven Phillips
      2. DRILL-386.patch
        105 kB
        Steven Phillips
      3. DRILL-386.patch
        103 kB
        Steven Phillips
      4. DRILL-386.patch
        112 kB
        Steven Phillips


        Steven Phillips created issue -
        Steven Phillips made changes -
        Field Original Value New Value
        Attachment DRILL-386.patch [ 12631117 ]
        Steven Phillips made changes -
        Attachment DRILL-386.patch [ 12633105 ]
        Steven Phillips made changes -
        Attachment DRILL-386.patch [ 12636414 ]
        Steven Phillips made changes -
        Attachment DRILL-386.patch [ 12637275 ]
        Jacques Nadeau made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Jake Farrell made changes -
        Workflow no-reopen-closed, patch-avail [ 12844390 ] no-reopen-closed, patch-avail, testing [ 12859987 ]
        Jacques Nadeau made changes -
        Fix Version/s 0.4.0 [ 12324963 ]
        Tony Stevenson made changes -
        Workflow no-reopen-closed, patch-avail, testing [ 12859987 ] Drill workflow [ 12934271 ]
        Khurram Faraaz made changes -
        Status Resolved [ 5 ] Closed [ 6 ]


          • Assignee:
            Steven Phillips
            Steven Phillips
          • Votes:
            0 Vote for this issue
            2 Start watching this issue


            • Created: