Details

      Description

      In Tajo, sort operator is similar to merge sort, and it works in a distributed manner. The first sort phase sorts each fragment in local machine, the intermediate data are shuffled in range partition, and then the second sort phase in each node sorts the range-partitioned data.

      However, the second sort phase reads all shuffled data via one scanner. It misses the opportunity to exploit already-sorted data. This patch improves the second sort phase to merge directly multiple already-sorted intermediate data sets. It significantly reduces the response time of sort queries.

      I carried out some simple benchmark with the following query on TPC-H 100GB data sets:

      select l_orderkey from lineitem order by l_orderkey;
      

      The lineitem table occupies 75GB. The query response time are dramatically reduced from 480 to 260 secs. This patch exploits the design of TAJO-36. So, this patch requires TAJO-36.

      1. TAJO-584_20140208_01:51:59.patch
        124 kB
        Hyunsik Choi
      2. TAJO-584.patch
        42 kB
        Hyunsik Choi

        Issue Links

          Activity

          Hide
          hyunsik Hyunsik Choi added a comment -

          Created a review request against branch master in reviewboard
          https://reviews.apache.org/r/17726/

          Show
          hyunsik Hyunsik Choi added a comment - Created a review request against branch master in reviewboard https://reviews.apache.org/r/17726/
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12626984/TAJO-584.patch
          against master revision 5177dcf.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 8 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 191 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-client tajo-core/tajo-core-backend tajo-storage.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/105//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/105//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core-backend.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/105//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12626984/TAJO-584.patch against master revision 5177dcf. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 191 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-client tajo-core/tajo-core-backend tajo-storage. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/105//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/105//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core-backend.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/105//console This message is automatically generated.
          Hide
          hyunsik Hyunsik Choi added a comment -

          Updated the review request against branch master in reviewboard
          https://reviews.apache.org/r/17726/

          Show
          hyunsik Hyunsik Choi added a comment - Updated the review request against branch master in reviewboard https://reviews.apache.org/r/17726/
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12627650/TAJO-584_20140208_01%3A51%3A59.patch
          against master revision 4179a7c.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 18 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 214 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-client tajo-core/tajo-core-backend tajo-core/tajo-core-pullserver tajo-storage.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/113//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/113//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/113//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core-backend.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/113//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core-pullserver.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/113//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627650/TAJO-584_20140208_01%3A51%3A59.patch against master revision 4179a7c. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 214 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-client tajo-core/tajo-core-backend tajo-core/tajo-core-pullserver tajo-storage. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/113//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/113//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/113//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core-backend.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/113//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core-pullserver.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/113//console This message is automatically generated.
          Hide
          hyunsik Hyunsik Choi added a comment -

          This issue got two +1s on RB. I've just committed it to master branch.

          Thank you for the reviews.

          Show
          hyunsik Hyunsik Choi added a comment - This issue got two +1s on RB. I've just committed it to master branch. Thank you for the reviews.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #60 (See https://builds.apache.org/job/Tajo-master-build/60/)
          TAJO-584: Improve distributed merge sort. (hyunsik: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=214b9741a510d1c2013e0dd494ab66017962367a)

          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlan.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestBSTIndexExec.java
          • tajo-storage/src/main/java/org/apache/tajo/storage/TupleRange.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/BaseAlgebraVisitor.java
          • tajo-core/tajo-core-backend/src/test/resources/queries/TestSortQuery/testSortWithAscDescKeys.sql
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestSortExec.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/RangePartitionAlgorithm.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/Task.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestLeftOuterNLJoinExec.java
          • tajo-storage/src/main/java/org/apache/tajo/storage/MergeScanner.java
          • tajo-core/tajo-core-backend/src/test/resources/queries/TestJoinQuery/testOuterJoinAndCaseWhen1.sql
          • tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSet.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestRightOuterHashJoinExec.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/utils/TupleUtil.java
          • tajo-core/tajo-core-backend/src/test/resources/dataset/TestSortQuery/table2.tbl
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/TestUniformRangePartition.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestLeftOuterHashJoinExec.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java
          • CHANGES.txt
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/RangeRetrieverHandler.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/worker/TestRangeRetrieverHandler.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java
          • tajo-storage/src/main/java/org/apache/tajo/storage/TupleComparator.java
          • tajo-storage/src/main/java/org/apache/tajo/storage/RawFile.java
          • tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/exception/AlreadyExistsTableException.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestExternalSortExec.java
          • tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/util/TestTupleUtil.java
          • tajo-core/tajo-core-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java
          • tajo-core/tajo-core-backend/src/test/resources/queries/TestSortQuery/create_table_with_asc_desc_keys.sql
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/UnaryPhysicalExec.java
          • tajo-core/tajo-core-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java
          • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
          • tajo-core/tajo-core-backend/src/test/resources/org/apache/tajo/jdbc/TestTajoResultSet.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #60 (See https://builds.apache.org/job/Tajo-master-build/60/ ) TAJO-584 : Improve distributed merge sort. (hyunsik: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=214b9741a510d1c2013e0dd494ab66017962367a ) tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlan.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/UniformRangePartition.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestBSTIndexExec.java tajo-storage/src/main/java/org/apache/tajo/storage/TupleRange.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/BaseAlgebraVisitor.java tajo-core/tajo-core-backend/src/test/resources/queries/TestSortQuery/testSortWithAscDescKeys.sql tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestSortExec.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/RangePartitionAlgorithm.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/Task.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestLeftOuterNLJoinExec.java tajo-storage/src/main/java/org/apache/tajo/storage/MergeScanner.java tajo-core/tajo-core-backend/src/test/resources/queries/TestJoinQuery/testOuterJoinAndCaseWhen1.sql tajo-client/src/main/java/org/apache/tajo/jdbc/TajoResultSet.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestRightOuterHashJoinExec.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/utils/TupleUtil.java tajo-core/tajo-core-backend/src/test/resources/dataset/TestSortQuery/table2.tbl tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/TestUniformRangePartition.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestLeftOuterHashJoinExec.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java CHANGES.txt tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/worker/RangeRetrieverHandler.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/worker/TestRangeRetrieverHandler.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestSortQuery.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestPhysicalPlanner.java tajo-storage/src/main/java/org/apache/tajo/storage/TupleComparator.java tajo-storage/src/main/java/org/apache/tajo/storage/RawFile.java tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/exception/AlreadyExistsTableException.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/planner/physical/TestExternalSortExec.java tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/util/TestTupleUtil.java tajo-core/tajo-core-pullserver/src/main/java/org/apache/tajo/pullserver/TajoPullServerService.java tajo-core/tajo-core-backend/src/test/resources/queries/TestSortQuery/create_table_with_asc_desc_keys.sql tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/UnaryPhysicalExec.java tajo-core/tajo-core-pullserver/src/main/java/org/apache/tajo/pullserver/PullServerAuxService.java tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java tajo-core/tajo-core-backend/src/test/resources/org/apache/tajo/jdbc/TestTajoResultSet.java

            People

            • Assignee:
              hyunsik Hyunsik Choi
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development