Tajo
  1. Tajo
  2. TAJO-36

Improve ExternalSortExec with N-merge sort and final pass omission

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Physical Operator
    • Labels:
      None

      Description

      Background:

      The current ExternalSortExec just uses the binary external merge sort algorithm (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort). In other words, for each pass, ExternalSortExec just merges two files into one sorted file.

      Proposal:
      The goal of this proposal is to improve ExternalSortExec with the following improvements:

      • N-merge sort - we can merge N files though more memory at each pass. It will reduce the number of passes. Consequently, it will reduces considerable I/O overheads.
      • the final pass omission - a physical operator is pipelined by the parent operator. The final pass of the merge sort must also be invoked by the parent physical operator. So, we can omit the final pass of the merge sort.
      1. TAJO-36.patch
        100 kB
        Hyunsik Choi
      2. TAJO-36_140204_163419.patch
        100 kB
        Hyunsik Choi
      3. TAJO-36_140204_172847.patch
        102 kB
        Hyunsik Choi
      4. TAJO-36_20140205_00:21:44.patch
        103 kB
        Hyunsik Choi
      5. TAJO-36_final.patch
        102 kB
        Hyunsik Choi

        Issue Links

          Activity

          Hyunsik Choi made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hyunsik Choi made changes -
          Comment [ Updated the review request against branch master in reviewboard
          ]
          Hyunsik Choi made changes -
          Attachment TAJO-36_final.patch [ 12626885 ]
          Hyunsik Choi made changes -
          Attachment TAJO-36_20140205_00:21:44.patch [ 12626884 ]
          Hyunsik Choi made changes -
          Link This issue is depended upon by TAJO-584 [ TAJO-584 ]
          Hyunsik Choi made changes -
          Attachment TAJO-36_140204_172847.patch [ 12626845 ]
          Hyunsik Choi made changes -
          Attachment TAJO-36_140204_163419.patch [ 12626839 ]
          Hyunsik Choi made changes -
          Labels gsoc gsoc2013 mentor
          Hyunsik Choi made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Fix Version/s 0.8-incubating [ 12324253 ]
          Hyunsik Choi made changes -
          Attachment TAJO-36.patch [ 12626835 ]
          Hyunsik Choi made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Hyunsik Choi made changes -
          Assignee Hyunsik Choi [ hyunsik ]
          Hyunsik Choi made changes -
          Priority Major [ 3 ] Critical [ 2 ]
          Gavin made changes -
          Workflow jira [ 12777572 ] patch-available, re-open possible [ 12778010 ]
          Hyunsik Choi made changes -
          Field Original Value New Value
          Labels gsoc2013 mentor gsoc gsoc2013 mentor
          Hyunsik Choi created issue -

            People

            • Assignee:
              Hyunsik Choi
              Reporter:
              Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development