Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-5350

Performance: skip merge for single-batch sort

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 1.10.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The external sort uses the classic two-step sort/merge process:

      • Sort each incoming batch. (Optionally spill batches when needed.)
      • Merge batches to create the final output.

      The external sort uses two distinct merge phases: one if all batches are in memory, another if some batches were spilled. The memory merge is obviously the fastest.

      A special case occurs when the sort sees only a single batch of data. In this case, that one batch is already sorted: there is no reason to also run the merge phase. Skipping the merge will speed up small "operational" queries.

      The effect of the optimization was measured using low-level unit tests that set up the sort and measured just the sort run time, omitting normal query overhead. Each run consisted of two phases. In the first phase, the test code was run five times to warm the JVM and Drill code cache. Then, the "money' run ran another five times. Run times where then averaged.

      Data consisted of 64K rows of a very simple schema: (INT, VARCHAR(5)).

      Run time without the optimization: 39 ms.

      Run time with the optimization: 25 ms.

      The result is about a 46% improvement.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Paul.Rogers Paul Rogers
                Reporter:
                paul-rogers Paul Rogers
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: