Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: distributed query plan
    • Labels:
      None

      Description

      As we discussed in TAJO-283, column-partitioned store needs hash shuffle phase. The main objective of this issue is to modify GlobalPlanner to do so.

        Activity

        Hide
        hyunsik Hyunsik Choi added a comment -

        I meant that namenode and datanode are launched with localhost hostname.

        Show
        hyunsik Hyunsik Choi added a comment - I meant that namenode and datanode are launched with localhost hostname.
        Hide
        coderplay Min Zhou added a comment -

        Actually, I deployed a distributed tajo cluster with 6 physical machines. Regarding to the local hdfs cluster, did you mean just launching a namenode daemon and a datanode daemon on local node?

        Show
        coderplay Min Zhou added a comment - Actually, I deployed a distributed tajo cluster with 6 physical machines. Regarding to the local hdfs cluster, did you mean just launching a namenode daemon and a datanode daemon on local node?
        Hide
        hyunsik Hyunsik Choi added a comment - - edited

        I think that It's not your mistake. Actually, we should have checked more various cases in unit tests.

        A local cluster that I mentioned means a single hadoop cluster and a Tajo cluster running on that cluster. It can be available with a very simple tajo-site.xml as follows. I expect that you already setup a local cluster.

        <configuration>
          <property>
            <name>tajo.rootdir</name>
            <value>hdfs://localhost:8020/tajo</value>
          </property>
        
          <property>
            <name>tajo.master.umbilical-rpc.address</name>
            <value>localhost:26001</value>
          </property>
        
          <property>
            <name>tajo.catalog.client-rpc.address</name>
            <value>localhost:26005</value>
          </property>
        </configuration>
        

        In addition, if you set a launch config in your IDE with two class paths ${TAJO_HOME}/conf and ${HADOOP_HOME}/etc/hadoop, you can directly execute TajoMaster or TajoWorker in your IDE. It will give more benefits for debugging.

        Show
        hyunsik Hyunsik Choi added a comment - - edited I think that It's not your mistake. Actually, we should have checked more various cases in unit tests. A local cluster that I mentioned means a single hadoop cluster and a Tajo cluster running on that cluster. It can be available with a very simple tajo-site.xml as follows. I expect that you already setup a local cluster. <configuration> <property> <name>tajo.rootdir</name> <value>hdfs: //localhost:8020/tajo</value> </property> <property> <name>tajo.master.umbilical-rpc.address</name> <value>localhost:26001</value> </property> <property> <name>tajo.catalog.client-rpc.address</name> <value>localhost:26005</value> </property> </configuration> In addition, if you set a launch config in your IDE with two class paths ${TAJO_HOME}/conf and ${HADOOP_HOME}/etc/hadoop, you can directly execute TajoMaster or TajoWorker in your IDE. It will give more benefits for debugging.
        Hide
        coderplay Min Zhou added a comment -

        Hyunsik Choi

        Sorry I don't know how to test it. Can you teach me the way how to test it on a local cluster?

        Min

        Show
        coderplay Min Zhou added a comment - Hyunsik Choi Sorry I don't know how to test it. Can you teach me the way how to test it on a local cluster? Min
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-trunk-postcommit #643 (See https://builds.apache.org/job/Tajo-trunk-postcommit/643/)
        TAJO-432: Add shuffle phase for column-partitioned table store. (Min Zhou via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=eac2507afaecea32ce406a32d7632016ebbc593d)

        • CHANGES.txt
        • tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-trunk-postcommit #643 (See https://builds.apache.org/job/Tajo-trunk-postcommit/643/ ) TAJO-432 : Add shuffle phase for column-partitioned table store. (Min Zhou via jihoon) (jihoonson: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=eac2507afaecea32ce406a32d7632016ebbc593d ) CHANGES.txt tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
        Hide
        jihoonson Jihoon Son added a comment -

        I committed the patch.
        Min, thanks for your contribution!

        Show
        jihoonson Jihoon Son added a comment - I committed the patch. Min, thanks for your contribution!
        Hide
        jihoonson Jihoon Son added a comment -

        Thanks, Hyunsik.
        I missed that bug.
        I'll commit the patch.

        Show
        jihoonson Jihoon Son added a comment - Thanks, Hyunsik. I missed that bug. I'll commit the patch.
        Hide
        hyunsik Hyunsik Choi added a comment -

        +1

        I just tested it on a local cluster. Even though this patch is implemented correctly, I experienced a query failure from INSERT OVERWRITE statement due to schema mismatch in 74 line in PartitionedStoreExec.

        In more detail, the names of partition columns are not qualified (i.e., no table name), while columns of input schema includes table names. Depending on query cases, 74 line may cause NPE.

        However, this is another issue not related to this issue. So, I'll create another jira issue for that bug.

        Thank you for your contribution.

        Show
        hyunsik Hyunsik Choi added a comment - +1 I just tested it on a local cluster. Even though this patch is implemented correctly, I experienced a query failure from INSERT OVERWRITE statement due to schema mismatch in 74 line in PartitionedStoreExec. In more detail, the names of partition columns are not qualified (i.e., no table name), while columns of input schema includes table names. Depending on query cases, 74 line may cause NPE. However, this is another issue not related to this issue. So, I'll create another jira issue for that bug. Thank you for your contribution.
        Hide
        hyunsik Hyunsik Choi added a comment -

        I'm reviewing this patch.

        Show
        hyunsik Hyunsik Choi added a comment - I'm reviewing this patch.
        Hide
        jihoonson Jihoon Son added a comment -

        I'll commit if there aren't any objections for a while.

        Show
        jihoonson Jihoon Son added a comment - I'll commit if there aren't any objections for a while.
        Hide
        hyunsik Hyunsik Choi added a comment -

        I'm sorry for the late. It's ok I just assigned this issue to you.

        Show
        hyunsik Hyunsik Choi added a comment - I'm sorry for the late. It's ok I just assigned this issue to you.
        Hide
        jihoonson Jihoon Son added a comment -

        Thanks, Min!
        This patch looks good to me. +1

        Show
        jihoonson Jihoon Son added a comment - Thanks, Min! This patch looks good to me. +1
        Hide
        coderplay Min Zhou added a comment -

        Super apologize that I am uploading a patch on this issue, cause it blocks me.

        Show
        coderplay Min Zhou added a comment - Super apologize that I am uploading a patch on this issue, cause it blocks me.

          People

          • Assignee:
            coderplay Min Zhou
            Reporter:
            hyunsik Hyunsik Choi
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development