Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-925

Child ExecutionBlock of JOIN node has different number of shuffle keys.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      If both sides of a join node is not SCAN but SUBQUERY, each node has different number shuffle keys.
      In that case JOIN query returns a wrong result. I tested with the below test code.

      @Test
      public void testJoinWithDifferentShuffleKey() throws Exception {
        KeyValueSet tableOptions = new KeyValueSet();
        tableOptions.put(StorageConstants.CSVFILE_DELIMITER, StorageConstants.DEFAULT_FIELD_DELIMITER);
        tableOptions.put(StorageConstants.CSVFILE_NULL, "\\\\N");
      
        Schema schema = new Schema();
        schema.addColumn("id", Type.INT4);
        schema.addColumn("name", Type.TEXT);
      
        List<String> data = new ArrayList<String>();
      
        int bytes = 0;
        for (int i = 0; i < 1000000; i++) {
          String row = i + "|" + i + "name012345678901234567890123456789012345678901234567890";
          bytes += row.getBytes().length;
          data.add(row);
          if (bytes > 2 * 1024 * 1024) {
            break;
          }
        }
        TajoTestingCluster.createTable("large_table", schema, tableOptions, data.toArray(new String[]{}));
      
        int originConfValue = conf.getIntVar(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME);
        testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname, "1");
        ResultSet res = executeString(
           "select count(b.id) " +
               "from (select id, count(*) as cnt from large_table group by id) a " +
               "left outer join (select id, count(*) as cnt from large_table where id < 200 group by id) b " +
               "on a.id = b.id"
        );
      
        try {
          String expected =
              "?count\n" +
                  "-------------------------------\n" +
                  "200\n";
      
          assertEquals(expected, resultSetToString(res));
        } finally {
          testingCluster.setAllTajoDaemonConfValue(ConfVars.DIST_QUERY_JOIN_PARTITION_VOLUME.varname, "" + originConfValue);
          cleanupQuery(res);
          executeString("DROP TABLE large_table PURGE").close();
        }
      }
      

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user babokim opened a pull request:

        https://github.com/apache/tajo/pull/61

        TAJO-925: Child ExecutionBlock of JOIN node has different number of shuffle keys.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/babokim/tajo TAJO-925

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/61.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #61


        commit c54d9f5d0e2891c5ea7ad6161f1409ee2718083d
        Author: 김형준 <babokim@babokim-macbook-pro.local>
        Date: 2014-07-09T07:56:27Z

        TAJO-925: Child ExecutionBlock of JOIN node has different number of shuffle keys.

        commit 15317af05e24e9346e2035188dfac1476d5f1d20
        Author: 김형준 <babokim@babokim-macbook-pro.local>
        Date: 2014-07-09T07:57:11Z

        Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user babokim opened a pull request: https://github.com/apache/tajo/pull/61 TAJO-925 : Child ExecutionBlock of JOIN node has different number of shuffle keys. You can merge this pull request into a Git repository by running: $ git pull https://github.com/babokim/tajo TAJO-925 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/61.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #61 commit c54d9f5d0e2891c5ea7ad6161f1409ee2718083d Author: 김형준 <babokim@babokim-macbook-pro.local> Date: 2014-07-09T07:56:27Z TAJO-925 : Child ExecutionBlock of JOIN node has different number of shuffle keys. commit 15317af05e24e9346e2035188dfac1476d5f1d20 Author: 김형준 <babokim@babokim-macbook-pro.local> Date: 2014-07-09T07:57:11Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/61

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/61
        Hide
        hyunsik Hyunsik Choi added a comment -

        committed.

        Show
        hyunsik Hyunsik Choi added a comment - committed.
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-build #285 (See https://builds.apache.org/job/Tajo-master-build/285/)
        TAJO-925: Child ExecutionBlock of JOIN node has different number of shuffle keys. (Hyoungjun Kim via hyunsik) (hyunsik: rev 438010f92bdbde50447d9fbc3438e57ddaff776f)

        • tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java
        • CHANGES
        • tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java
        • tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/global/ExecutionBlockCursor.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-build #285 (See https://builds.apache.org/job/Tajo-master-build/285/ ) TAJO-925 : Child ExecutionBlock of JOIN node has different number of shuffle keys. (Hyoungjun Kim via hyunsik) (hyunsik: rev 438010f92bdbde50447d9fbc3438e57ddaff776f) tajo-core/src/test/java/org/apache/tajo/engine/query/TestJoinQuery.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java CHANGES tajo-core/src/main/java/org/apache/tajo/master/querymaster/SubQuery.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java tajo-core/src/main/java/org/apache/tajo/engine/planner/global/ExecutionBlockCursor.java

          People

          • Assignee:
            hjkim Hyoungjun Kim
            Reporter:
            hjkim Hyoungjun Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development