Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-36

Improve ExternalSortExec with N-merge sort and final pass omission

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Physical Operator
    • Labels:
      None

      Description

      Background:

      The current ExternalSortExec just uses the binary external merge sort algorithm (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other words, for each pass, ExternalSortExec just merges two files into one sorted file.

      Proposal:
      The goal of this proposal is to improve ExternalSortExec with the following improvements:

      • N-merge sort - we can merge N files though more memory at each pass. It will reduce the number of passes. Consequently, it will reduces considerable I/O overheads.
      • the final pass omission - a physical operator is pipelined by the parent operator. The final pass of the merge sort must also be invoked by the parent physical operator. So, we can omit the final pass of the merge sort.

        Attachments

        1. TAJO-36_140204_163419.patch
          100 kB
          Hyunsik Choi
        2. TAJO-36_140204_172847.patch
          102 kB
          Hyunsik Choi
        3. TAJO-36_20140205_00:21:44.patch
          103 kB
          Hyunsik Choi
        4. TAJO-36_final.patch
          102 kB
          Hyunsik Choi
        5. TAJO-36.patch
          100 kB
          Hyunsik Choi

          Issue Links

            Activity

              People

              • Assignee:
                hyunsik Hyunsik Choi
                Reporter:
                hyunsik Hyunsik Choi
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: