Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1460

Apply TAJO-1407 to ExternalSortExec

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: None
    • Labels:
      None

      Description

      Tipped by @Hyoungjun Kim. Seemed possible to apply easily.

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #644 (See https://builds.apache.org/job/Tajo-master-build/644/)
          TAJO-1460 Apply TAJO-1407 to ExternalSortExec (babokim: rev 6e7f1c7b0561c7299a498b81dfb6148883a76a0d)

          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk3.sql
          • tajo-core/src/test/resources/queries/TestJoinQuery/testComplexJoinCondition7.sql
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk2.sql
          • tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithJson.json
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ComparableVector.java
          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk2.sql
          • CHANGES
          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk1.sql
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/VectorizedSorter.java
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk3.sql
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk1.sql
          • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/MemoryUtil.java
          • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/AbstractScanner.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/MemSortExec.java
          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk4.sql
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortExec.java
          • tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestExternalSortExec.java
          • tajo-core/src/test/resources/results/TestJoinQuery/testComplexJoinCondition7.result
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk4.sql
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/TupleSorter.java
          • tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestTupleSorter.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #644 (See https://builds.apache.org/job/Tajo-master-build/644/ ) TAJO-1460 Apply TAJO-1407 to ExternalSortExec (babokim: rev 6e7f1c7b0561c7299a498b81dfb6148883a76a0d) tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk3.sql tajo-core/src/test/resources/queries/TestJoinQuery/testComplexJoinCondition7.sql tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk2.sql tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithJson.json tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ComparableVector.java tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk2.sql CHANGES tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk1.sql tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/VectorizedSorter.java tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk3.sql tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk1.sql tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/MemoryUtil.java tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/AbstractScanner.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/MemSortExec.java tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk4.sql tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortExec.java tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestExternalSortExec.java tajo-core/src/test/resources/results/TestJoinQuery/testComplexJoinCondition7.result tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk4.sql tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/TupleSorter.java tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestTupleSorter.java
          Hide
          hudson Hudson added a comment -

          ABORTED: Integrated in Tajo-master-CODEGEN-build #281 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/281/)
          TAJO-1460 Apply TAJO-1407 to ExternalSortExec (babokim: rev 6e7f1c7b0561c7299a498b81dfb6148883a76a0d)

          • tajo-core/src/test/resources/results/TestJoinQuery/testComplexJoinCondition7.result
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk3.sql
          • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/AbstractScanner.java
          • tajo-core/src/test/resources/queries/TestJoinQuery/testComplexJoinCondition7.sql
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk4.sql
          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk3.sql
          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/TupleSorter.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/VectorizedSorter.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ComparableVector.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java
          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk4.sql
          • tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/MemoryUtil.java
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk2.sql
          • tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestTupleSorter.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/MemSortExec.java
          • tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithJson.json
          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk2.sql
          • tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk1.sql
          • tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk1.sql
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortExec.java
          • tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestExternalSortExec.java
          Show
          hudson Hudson added a comment - ABORTED: Integrated in Tajo-master-CODEGEN-build #281 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/281/ ) TAJO-1460 Apply TAJO-1407 to ExternalSortExec (babokim: rev 6e7f1c7b0561c7299a498b81dfb6148883a76a0d) tajo-core/src/test/resources/results/TestJoinQuery/testComplexJoinCondition7.result tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk3.sql tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/AbstractScanner.java tajo-core/src/test/resources/queries/TestJoinQuery/testComplexJoinCondition7.sql tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk4.sql tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk3.sql CHANGES tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/TupleSorter.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/VectorizedSorter.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ComparableVector.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk4.sql tajo-storage/tajo-storage-common/src/main/java/org/apache/tajo/storage/MemoryUtil.java tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk2.sql tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestTupleSorter.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/MemSortExec.java tajo-core/src/test/resources/queries/TestJoinQuery/testJoinWithJson.json tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk2.sql tajo-core/src/test/resources/queries/TestJoinBroadcast/testCrossJoinWithAsterisk1.sql tajo-core/src/test/resources/queries/TestJoinQuery/testCrossJoinWithAsterisk1.sql tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortExec.java tajo-core/src/test/java/org/apache/tajo/engine/planner/physical/TestExternalSortExec.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/474

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/474
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88753167

          Here is my +1.
          I'll commit soon.

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88753167 Here is my +1. I'll commit soon.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88712239

          +1 LGTM.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88712239 +1 LGTM.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user navis commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88685267

          Addressed comments

          Show
          githubbot ASF GitHub Bot added a comment - Github user navis commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88685267 Addressed comments
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88681002

          Good idea. But, it would be better to be considered as a separate issue. Before introducing that new feature, I think it is ineviable to use order-by clauses for stable tests.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88681002 Good idea. But, it would be better to be considered as a separate issue. Before introducing that new feature, I think it is ineviable to use order-by clauses for stable tests.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user navis commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88673336

          There was a test option in hive for ordering output of test result because order-by takes more time and sometimes tricky in complex queries. How about to introduce something like that?

          Show
          githubbot ASF GitHub Bot added a comment - Github user navis commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88673336 There was a test option in hive for ordering output of test result because order-by takes more time and sometimes tricky in complex queries. How about to introduce something like that?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88443206

          Thank you for understanding.
          I missed that MemSortExec is not used anymore when reviewing TAJO-1407. Thanks for letting @navis know that.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88443206 Thank you for understanding. I missed that MemSortExec is not used anymore when reviewing TAJO-1407 . Thanks for letting @navis know that.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88440268

          @jihoonson thank your for your opinion and comment.
          @navis I left another comments to reflect @jihoonson comments.

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88440268 @jihoonson thank your for your opinion and comment. @navis I left another comments to reflect @jihoonson comments.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/474#discussion_r27559986

          — Diff: tajo-core/src/test/resources/results/TestJoinBroadcast/testCrossJoinWithAsterisk1.result —
          @@ -1,8 +1,8 @@
          r_regionkey,r_name,r_comment,c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment
          — End diff –

          @jihoonson recommend to add an order by clause to queries if necessary rather than just changing the order of rows of query results. I agree with this opinion.
          For example, you can add 'c_custkey' column in order by clause in "testCrossJoinWithAsterisk1.sql".

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on a diff in the pull request: https://github.com/apache/tajo/pull/474#discussion_r27559986 — Diff: tajo-core/src/test/resources/results/TestJoinBroadcast/testCrossJoinWithAsterisk1.result — @@ -1,8 +1,8 @@ r_regionkey,r_name,r_comment,c_custkey,c_name,c_address,c_nationkey,c_phone,c_acctbal,c_mktsegment,c_comment — End diff – @jihoonson recommend to add an order by clause to queries if necessary rather than just changing the order of rows of query results. I agree with this opinion. For example, you can add 'c_custkey' column in order by clause in "testCrossJoinWithAsterisk1.sql".
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/474#discussion_r27559826

          — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java —
          @@ -89,9 +89,9 @@ public TestJoinQuery(String joinOption) {
          @Parameters
          public static Collection<Object[]> generateParameters() {
          return Arrays.asList(new Object[][]{

          • {"Hash_NoBroadcast"}

            ,

              • End diff –

          These parameterized test cases should be included.
          Each query with different parameter returns different order of rows. Adding additional order by column or order by clause can resolve this problem as the following example:
          testComplexJoinCondition7.sql
          ```
          select
          n1.n_nationkey,
          n1.n_name,
          n2.n_name
          from nation n1 join (select * from nation union select * from nation) n2 on substr(n1.n_name, 1, 4) = substr(n2.n_name, 1, 4)
          order by n1.n_nationkey;
          ```

          You can add "n2.n_name" column in where clause.

          ```
          select
          n1.n_nationkey,
          n1.n_name,
          n2.n_name
          from nation n1 join (select * from nation union select * from nation) n2 on substr(n1.n_name, 1, 4) = substr(n2.n_name, 1, 4)
          order by n1.n_nationkey, n2.n_name;
          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on a diff in the pull request: https://github.com/apache/tajo/pull/474#discussion_r27559826 — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java — @@ -89,9 +89,9 @@ public TestJoinQuery(String joinOption) { @Parameters public static Collection<Object[]> generateParameters() { return Arrays.asList(new Object[][]{ {"Hash_NoBroadcast"} , End diff – These parameterized test cases should be included. Each query with different parameter returns different order of rows. Adding additional order by column or order by clause can resolve this problem as the following example: testComplexJoinCondition7.sql ``` select n1.n_nationkey, n1.n_name, n2.n_name from nation n1 join (select * from nation union select * from nation) n2 on substr(n1.n_name, 1, 4) = substr(n2.n_name, 1, 4) order by n1.n_nationkey; ``` You can add "n2.n_name" column in where clause. ``` select n1.n_nationkey, n1.n_name, n2.n_name from nation n1 join (select * from nation union select * from nation) n2 on substr(n1.n_name, 1, 4) = substr(n2.n_name, 1, 4) order by n1.n_nationkey, n2.n_name; ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88426959

          In addition, I recommend to add an order by clause to queries if necessary rather than just changing the order of rows of query results.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88426959 In addition, I recommend to add an order by clause to queries if necessary rather than just changing the order of rows of query results.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user jihoonson commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88426075

          Please hold a second.
          I wonder why the parameterized test in TestJoinQuery is disabled.

          Show
          githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88426075 Please hold a second. I wonder why the parameterized test in TestJoinQuery is disabled.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-88424017

          +1
          Thanks, test case is modified and all tests passed. I'll commit soon.

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-88424017 +1 Thanks, test case is modified and all tests passed. I'll commit soon.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user navis opened a pull request:

          https://github.com/apache/tajo/pull/487

          Apply on *GroupByExec

          Based on TAJO-1460 & TAJO-1484

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/navis/tajo TAJO-1492

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/487.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #487


          commit 78379a3ed39d5076b50434b1fef7bd67ddeb99d2
          Author: navis.ryu <navis@apache.org>
          Date: 2015-03-27T13:34:23Z

          TAJO-1460 Apply TAJO-1407 to ExternalSortExec

          commit 7505a5a91a20a691f4407858ff86af6bc2b3b678
          Author: navis.ryu <navis@apache.org>
          Date: 2015-03-30T09:24:31Z

          TAJO-1484 Apply on ColPartitionStoreExec

          commit 5e0512a9d39cba2cf7c038a18e4079417b991aae
          Author: navis.ryu <navis@apache.org>
          Date: 2015-04-01T01:02:06Z

          TAJO-1492 Apply on *GroupByExec


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user navis opened a pull request: https://github.com/apache/tajo/pull/487 Apply on *GroupByExec Based on TAJO-1460 & TAJO-1484 You can merge this pull request into a Git repository by running: $ git pull https://github.com/navis/tajo TAJO-1492 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/487.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #487 commit 78379a3ed39d5076b50434b1fef7bd67ddeb99d2 Author: navis.ryu <navis@apache.org> Date: 2015-03-27T13:34:23Z TAJO-1460 Apply TAJO-1407 to ExternalSortExec commit 7505a5a91a20a691f4407858ff86af6bc2b3b678 Author: navis.ryu <navis@apache.org> Date: 2015-03-30T09:24:31Z TAJO-1484 Apply on ColPartitionStoreExec commit 5e0512a9d39cba2cf7c038a18e4079417b991aae Author: navis.ryu <navis@apache.org> Date: 2015-04-01T01:02:06Z TAJO-1492 Apply on *GroupByExec
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user navis commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/474#discussion_r27443083

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java —
          @@ -630,19 +636,43 @@ public TableStats getInputStats()

          { CLOSED }

          + private static class SimpleMerger extends PairWiseMerger {
          — End diff –

          I'm thinking of rewriting current up-heap to down-heap to reduce comparing further (50% for average AFAIK). So I've named it randomly. Till then let's name it VectorComparePairWiseMerger.

          Show
          githubbot ASF GitHub Bot added a comment - Github user navis commented on a diff in the pull request: https://github.com/apache/tajo/pull/474#discussion_r27443083 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java — @@ -630,19 +636,43 @@ public TableStats getInputStats() { CLOSED } + private static class SimpleMerger extends PairWiseMerger { — End diff – I'm thinking of rewriting current up-heap to down-heap to reduce comparing further (50% for average AFAIK). So I've named it randomly. Till then let's name it VectorComparePairWiseMerger.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user navis commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/474#discussion_r27442930

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java —
          @@ -168,7 +168,7 @@ private Path sortAndStoreChunk(int chunkId, List<Tuple> tupleBlock)
          int rowNum = tupleBlock.size();

          long sortStart = System.currentTimeMillis();

          • Collections.sort(tupleBlock, getComparator());
            + Iterable<Tuple> sorted = getSorter(tupleBlock).sort();
              • End diff –

          Fixed that (some undoing seemed caused that) and changed the test.

          Show
          githubbot ASF GitHub Bot added a comment - Github user navis commented on a diff in the pull request: https://github.com/apache/tajo/pull/474#discussion_r27442930 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java — @@ -168,7 +168,7 @@ private Path sortAndStoreChunk(int chunkId, List<Tuple> tupleBlock) int rowNum = tupleBlock.size(); long sortStart = System.currentTimeMillis(); Collections.sort(tupleBlock, getComparator()); + Iterable<Tuple> sorted = getSorter(tupleBlock).sort(); End diff – Fixed that (some undoing seemed caused that) and changed the test.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on the pull request:

          https://github.com/apache/tajo/pull/474#issuecomment-87731130

          @navis thanks for your contribution. I left some comments.

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on the pull request: https://github.com/apache/tajo/pull/474#issuecomment-87731130 @navis thanks for your contribution. I left some comments.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/474#discussion_r27389998

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java —
          @@ -630,19 +636,43 @@ public TableStats getInputStats()

          { CLOSED }

          + private static class SimpleMerger extends PairWiseMerger {
          — End diff –

          I think SimpleMerger is not proper name for this class. First time when I saw the next code, I can understand after checking the detail code of SimpleMerger class. How about "VectorComparePairWiseMerger".
          '''
          if (ComparableVector.isApplicable(sortSpecs))

          { return new SimpleMerger(inSchema, left, right, comparator); }

          return new PairWiseMerger(inSchema, left, right, comparator);
          '''

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on a diff in the pull request: https://github.com/apache/tajo/pull/474#discussion_r27389998 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java — @@ -630,19 +636,43 @@ public TableStats getInputStats() { CLOSED } + private static class SimpleMerger extends PairWiseMerger { — End diff – I think SimpleMerger is not proper name for this class. First time when I saw the next code, I can understand after checking the detail code of SimpleMerger class. How about "VectorComparePairWiseMerger". ''' if (ComparableVector.isApplicable(sortSpecs)) { return new SimpleMerger(inSchema, left, right, comparator); } return new PairWiseMerger(inSchema, left, right, comparator); '''
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user babokim commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/474#discussion_r27389087

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java —
          @@ -168,7 +168,7 @@ private Path sortAndStoreChunk(int chunkId, List<Tuple> tupleBlock)
          int rowNum = tupleBlock.size();

          long sortStart = System.currentTimeMillis();

          • Collections.sort(tupleBlock, getComparator());
            + Iterable<Tuple> sorted = getSorter(tupleBlock).sort();
              • End diff –

          appender should use this "sorted" instead of tupleBlock in the following for statement.
          ```
          for (Tuple t : sorted)

          { // tupleBlock -> sorted appender.addTuple(t); }

          ```
          I was wonder how the code passed TestExternalSortExec. I found two reasons.
          First, the above logic is not covered in current test condition. "numTuple" in TestExternalSortExec code should be changed to "3000000" to cover more use case.
          Second, ProjectionExec is parent node in the test program.
          ProjectionExec reuses the tuple object when next() is called. So TestExternalSortExec code should be changed as the following:
          ```
          while ((tuple = exec.next()) != null)

          { ... //preVal = curVal; preVal = new VTuple(curVal); cnt++; }

          ```

          Show
          githubbot ASF GitHub Bot added a comment - Github user babokim commented on a diff in the pull request: https://github.com/apache/tajo/pull/474#discussion_r27389087 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java — @@ -168,7 +168,7 @@ private Path sortAndStoreChunk(int chunkId, List<Tuple> tupleBlock) int rowNum = tupleBlock.size(); long sortStart = System.currentTimeMillis(); Collections.sort(tupleBlock, getComparator()); + Iterable<Tuple> sorted = getSorter(tupleBlock).sort(); End diff – appender should use this "sorted" instead of tupleBlock in the following for statement. ``` for (Tuple t : sorted) { // tupleBlock -> sorted appender.addTuple(t); } ``` I was wonder how the code passed TestExternalSortExec. I found two reasons. First, the above logic is not covered in current test condition. "numTuple" in TestExternalSortExec code should be changed to "3000000" to cover more use case. Second, ProjectionExec is parent node in the test program. ProjectionExec reuses the tuple object when next() is called. So TestExternalSortExec code should be changed as the following: ``` while ((tuple = exec.next()) != null) { ... //preVal = curVal; preVal = new VTuple(curVal); cnt++; } ```
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user navis opened a pull request:

          https://github.com/apache/tajo/pull/474

          TAJO-1460 Apply TAJO-1407 to ExternalSortExec

          Applied.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/navis/tajo TAJO-1460

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/474.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #474


          commit f4a68fde068dcc911f327da1c8375eb3e4f6de2c
          Author: navis.ryu <navis@apache.org>
          Date: 2015-03-27T13:34:23Z

          TAJO-1460 Apply TAJO-1407 to ExternalSortExec


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user navis opened a pull request: https://github.com/apache/tajo/pull/474 TAJO-1460 Apply TAJO-1407 to ExternalSortExec Applied. You can merge this pull request into a Git repository by running: $ git pull https://github.com/navis/tajo TAJO-1460 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/474.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #474 commit f4a68fde068dcc911f327da1c8375eb3e4f6de2c Author: navis.ryu <navis@apache.org> Date: 2015-03-27T13:34:23Z TAJO-1460 Apply TAJO-1407 to ExternalSortExec

            People

            • Assignee:
              navis Navis
              Reporter:
              navis Navis
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development