Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-748

Shuffle output numbers of join may be inconsistent.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.8.0, 0.9.0
    • Fix Version/s: 0.8.0, 0.9.0
    • Component/s: Planner/Optimizer
    • Labels:
      None

      Description

      I found that inline view doesn't run expected at multiple join as follows:

      Environment

      • DataSet: TPC-DS
      • tajo.dist-query.join.broadcast.auto : false

      Case: 1

      SELECT COUNT(*)
      FROM (
        SELECT cs.cs_item_sk as cs_item_sk,
        cs.cs_ext_discount_amt as cs_ext_discount_amt
        FROM catalog_sales cs
        JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
        WHERE d.d_date between '2000-01-27' and '2000-04-27'
      ) cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk);
      
      • actual result: 4163848
      • expected result: 4163848

      Case: 2

      select count(*)
      from item i
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk);
      
      • actual result: 102000
      • expected result: 102000

      Case: 3

      SELECT COUNT(*)
      FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                   cs.cs_ext_discount_amt as cs_ext_discount_amt
                   FROM catalog_sales cs
                   JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                   WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk)
      WHERE i.i_manufact_id = 436;
      
      • actual result: 80
      • expected result: 4586

      Case: 4

      SELECT COUNT(*)
      FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                   cs.cs_ext_discount_amt as cs_ext_discount_amt
                   FROM catalog_sales cs
                   JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                   WHERE d.d_date between '2000-01-27' and '2000-04-27') cs1
      JOIN item i ON (i.i_item_sk = cs1.cs_item_sk)
      JOIN (SELECT cs2.cs_item_sk as cs_item_sk,
                                1.3 * avg(cs_ext_discount_amt) as avg_cs_ext_discount_amt
                 FROM (SELECT cs.cs_item_sk as cs_item_sk,
                                              cs.cs_ext_discount_amt as cs_ext_discount_amt
                              FROM catalog_sales cs
                              JOIN date_dim d ON (d.d_date_sk = cs.cs_sold_date_sk)
                              WHERE d.d_date between '2000-01-27' and '2000-04-27') cs2
                              GROUP BY cs2.cs_item_sk) tmp1
      ON (i.i_item_sk = tmp1.cs_item_sk)
      
      • actual result: 71147
      • expected result: 4163848

      For reference, I made activated result using hive.

      1. TAJO-748_2.patch
        8 kB
        Jaehwa Jung
      2. TAJO-748_3.patch
        8 kB
        Jaehwa Jung
      3. TAJO-748_4.patch
        7 kB
        Jaehwa Jung
      4. TAJO-748_5.patch
        9 kB
        Jaehwa Jung
      5. TAJO-748.patch
        6 kB
        Jaehwa Jung

        Issue Links

          Activity

          Hide
          blrunner Jaehwa Jung added a comment -

          I found that a shuffle partition number doesn't correspond with its pair partition number in special case as follows:

          • Test query: above Case #3
          • Left block
            =======================================================
            Block Id: eb_1397788326208_0003_000006 [INTERMEDIATE]
            =======================================================
            
            [Incoming]
            [q_1397788326208_0003] 1 => 6 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk (INT4), num=1)
            [q_1397788326208_0003] 5 => 6 (type=HASH_SHUFFLE, key=tpcds100.tmp1.cs_item_sk (INT4), num=1)
            
            [Outgoing]
            [q_1397788326208_0003] 6 => 10 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk (INT4), num=232)
            
          • Right block
            =======================================================
            Block Id: eb_1397788326208_0003_000009 [INTERMEDIATE]
            =======================================================
            
            [Incoming]
            [q_1397788326208_0003] 7 => 9 (type=HASH_SHUFFLE, key=tpcds100.cs.cs_sold_date_sk (INT4), num=232)
            [q_1397788326208_0003] 8 => 9 (type=HASH_SHUFFLE, key=tpcds100.d.d_date_sk (INT4), num=232)
            
            [Outgoing]
            [q_1397788326208_0003] 9 => 10 (type=HASH_SHUFFLE, key=tpcds100.cs1.cs_item_sk (INT4), num=20)
            
          • Join Block
            =======================================================
            Block Id: eb_1397788326208_0003_000010 [INTERMEDIATE]
            =======================================================
            
            [Incoming]
            [q_1397788326208_0003] 6 => 10 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk (INT4), num=232)
            [q_1397788326208_0003] 9 => 10 (type=HASH_SHUFFLE, key=tpcds100.cs1.cs_item_sk (INT4), num=20)
            
            [Outgoing]
            [q_1397788326208_0003] 10 => 11 (type=HASH_SHUFFLE, key=, num=1)
            

          As above, left block and right block calculated different partition numbers.
          Thus, we can't find expected result.

          Show
          blrunner Jaehwa Jung added a comment - I found that a shuffle partition number doesn't correspond with its pair partition number in special case as follows: Test query: above Case #3 Left block ======================================================= Block Id: eb_1397788326208_0003_000006 [INTERMEDIATE] ======================================================= [Incoming] [q_1397788326208_0003] 1 => 6 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk (INT4), num=1) [q_1397788326208_0003] 5 => 6 (type=HASH_SHUFFLE, key=tpcds100.tmp1.cs_item_sk (INT4), num=1) [Outgoing] [q_1397788326208_0003] 6 => 10 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk (INT4), num=232) Right block ======================================================= Block Id: eb_1397788326208_0003_000009 [INTERMEDIATE] ======================================================= [Incoming] [q_1397788326208_0003] 7 => 9 (type=HASH_SHUFFLE, key=tpcds100.cs.cs_sold_date_sk (INT4), num=232) [q_1397788326208_0003] 8 => 9 (type=HASH_SHUFFLE, key=tpcds100.d.d_date_sk (INT4), num=232) [Outgoing] [q_1397788326208_0003] 9 => 10 (type=HASH_SHUFFLE, key=tpcds100.cs1.cs_item_sk (INT4), num=20) Join Block ======================================================= Block Id: eb_1397788326208_0003_000010 [INTERMEDIATE] ======================================================= [Incoming] [q_1397788326208_0003] 6 => 10 (type=HASH_SHUFFLE, key=tpcds100.i.i_item_sk (INT4), num=232) [q_1397788326208_0003] 9 => 10 (type=HASH_SHUFFLE, key=tpcds100.cs1.cs_item_sk (INT4), num=20) [Outgoing] [q_1397788326208_0003] 10 => 11 (type=HASH_SHUFFLE, key=, num=1) As above, left block and right block calculated different partition numbers. Thus, we can't find expected result.
          Hide
          blrunner Jaehwa Jung added a comment -

          I found the cause for this bug. Only if tajo.dist-query.join.broadcast.auto is true, GlobalPlanner set broad cast tables. But although the property is false, we need to set broad cast tables. Because if we don't set it, some SubQueries will be omitted.

          Show
          blrunner Jaehwa Jung added a comment - I found the cause for this bug. Only if tajo.dist-query.join.broadcast.auto is true, GlobalPlanner set broad cast tables. But although the property is false, we need to set broad cast tables. Because if we don't set it, some SubQueries will be omitted.
          Hide
          blrunner Jaehwa Jung added a comment -

          I verified this patch with TPC-H Q10 query, TAJO-747 test cases, TAJO-750 test cases.

          For reference, I can't create new reviewboard because RB service made 502 Proxy Error.

          Show
          blrunner Jaehwa Jung added a comment - I verified this patch with TPC-H Q10 query, TAJO-747 test cases, TAJO-750 test cases. For reference, I can't create new reviewboard because RB service made 502 Proxy Error.
          Hide
          blrunner Jaehwa Jung added a comment -

          I uploaded second patch and I created a review request against branch master in reviewboard.
          https://reviews.apache.org/r/20478/

          Show
          blrunner Jaehwa Jung added a comment - I uploaded second patch and I created a review request against branch master in reviewboard. https://reviews.apache.org/r/20478/
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12640795/TAJO-748_2.patch
          against master revision 6594ac1.

          -1 patch. The patch command could not apply the patch.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/355//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640795/TAJO-748_2.patch against master revision 6594ac1. -1 patch. The patch command could not apply the patch. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/355//console This message is automatically generated.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12640795/TAJO-748_2.patch
          against master revision 6594ac1.

          -1 patch. The patch command could not apply the patch.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/356//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640795/TAJO-748_2.patch against master revision 6594ac1. -1 patch. The patch command could not apply the patch. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/356//console This message is automatically generated.
          Hide
          blrunner Jaehwa Jung added a comment -

          I've rebased on master branch.

          Show
          blrunner Jaehwa Jung added a comment - I've rebased on master branch.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12640827/TAJO-748_3.patch
          against master revision f8ba4db.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 196 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-core.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/358//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/358//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/358//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640827/TAJO-748_3.patch against master revision f8ba4db. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 196 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/358//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/358//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/358//console This message is automatically generated.
          Hide
          blrunner Jaehwa Jung added a comment -

          I uploaded fourth patch.

          Show
          blrunner Jaehwa Jung added a comment - I uploaded fourth patch.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12640937/TAJO-748_4.patch
          against master revision 6a005ce.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 196 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-core.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/360//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/360//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/360//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640937/TAJO-748_4.patch against master revision 6a005ce. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 196 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/360//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/360//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/360//console This message is automatically generated.
          Hide
          blrunner Jaehwa Jung added a comment -

          I uploaded fifth patch.

          Show
          blrunner Jaehwa Jung added a comment - I uploaded fifth patch.
          Hide
          tajoqa Tajo QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12641022/TAJO-748_5.patch
          against master revision f1f36ec.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 5 new or modified test files.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

          +1 checkstyle. The patch generated 0 code style errors.

          -1 findbugs. The patch appears to introduce 196 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in tajo-core.

          Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/362//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/362//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
          Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/362//console

          This message is automatically generated.

          Show
          tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12641022/TAJO-748_5.patch against master revision f1f36ec. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 196 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-core. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/362//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/362//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/362//console This message is automatically generated.
          Hide
          blrunner Jaehwa Jung added a comment -

          This patch got +1 at RB, and I just committed to master and branch-0.8.0.

          Show
          blrunner Jaehwa Jung added a comment - This patch got +1 at RB, and I just committed to master and branch-0.8.0.
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #190 (See https://builds.apache.org/job/Tajo-master-build/190/)
          TAJO-748: Shuffle output numbers of join may be inconsistent. (jaehwa) (jhjung: rev 2b27f7de70904552d38801f57aa12396a9df75ac)

          • tajo-core/src/test/java/org/apache/tajo/master/querymaster/TestQueryUnitStatusUpdate.java
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java
          • tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery2.result
          • CHANGES.txt
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
          • tajo-core/src/test/resources/queries/TestNetTypes/testJoin.sql
          • tajo-core/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #190 (See https://builds.apache.org/job/Tajo-master-build/190/ ) TAJO-748 : Shuffle output numbers of join may be inconsistent. (jaehwa) (jhjung: rev 2b27f7de70904552d38801f57aa12396a9df75ac) tajo-core/src/test/java/org/apache/tajo/master/querymaster/TestQueryUnitStatusUpdate.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery2.result CHANGES.txt tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-core/src/test/resources/queries/TestNetTypes/testJoin.sql tajo-core/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-0.8.0-build #82 (See https://builds.apache.org/job/Tajo-0.8.0-build/82/)
          TAJO-748: Shuffle output numbers of join may be inconsistent. (jaehwa) (jhjung: rev 0426e5e42fc9bd1ac420af224101745f6e1ae0a8)

          • CHANGES.txt
          • tajo-core/src/test/resources/queries/TestNetTypes/testJoin.sql
          • tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery2.result
          • tajo-core/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
          • tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java
          • tajo-core/src/test/java/org/apache/tajo/master/querymaster/TestQueryUnitStatusUpdate.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-0.8.0-build #82 (See https://builds.apache.org/job/Tajo-0.8.0-build/82/ ) TAJO-748 : Shuffle output numbers of join may be inconsistent. (jaehwa) (jhjung: rev 0426e5e42fc9bd1ac420af224101745f6e1ae0a8) CHANGES.txt tajo-core/src/test/resources/queries/TestNetTypes/testJoin.sql tajo-core/src/test/resources/results/TestJoinBroadcast/testBroadcastSubquery2.result tajo-core/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinBroadcast.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java tajo-core/src/test/java/org/apache/tajo/master/querymaster/TestQueryUnitStatusUpdate.java

            People

            • Assignee:
              blrunner Jaehwa Jung
              Reporter:
              blrunner Jaehwa Jung
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development