Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      We should support 'INSERT INTO ... SELECT' statement.

        Issue Links

          Activity

          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #300 (See https://builds.apache.org/job/Tajo-master-build/300/)
          TAJO-20: INSERT INTO ... SELECT. (Hyoungjun Kim via hyunsik) (hyunsik: rev 499d1081b66e0a785dce82430e90bce9d39caa7e)

          • CHANGES
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashBasedColPartitionStoreExec.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
          • tajo-storage/src/main/java/org/apache/tajo/storage/StorageUtil.java
          • tajo-core/src/test/resources/queries/TestInsertQuery/testInsertInto.sql
          • tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryMasterTask.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java
          • tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java
          • tajo-core/src/main/java/org/apache/tajo/worker/Task.java
          • tajo-core/src/test/java/org/apache/tajo/QueryTestCaseBase.java
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestInsertQuery.java
          • tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionStoreExec.java
          • tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #300 (See https://builds.apache.org/job/Tajo-master-build/300/ ) TAJO-20 : INSERT INTO ... SELECT. (Hyoungjun Kim via hyunsik) (hyunsik: rev 499d1081b66e0a785dce82430e90bce9d39caa7e) CHANGES tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/HashBasedColPartitionStoreExec.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java tajo-storage/src/main/java/org/apache/tajo/storage/StorageUtil.java tajo-core/src/test/resources/queries/TestInsertQuery/testInsertInto.sql tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryMasterTask.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/SortBasedColPartitionStoreExec.java tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java tajo-core/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java tajo-core/src/main/java/org/apache/tajo/worker/Task.java tajo-core/src/test/java/org/apache/tajo/QueryTestCaseBase.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestInsertQuery.java tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionStoreExec.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user asfgit closed the pull request at:

          https://github.com/apache/tajo/pull/72

          Show
          githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/72
          Hide
          hyunsik Hyunsik Choi added a comment -

          committed it to master branch.

          Show
          hyunsik Hyunsik Choi added a comment - committed it to master branch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/72#issuecomment-49004222

          +1

          In overall, the patch looks nice to me, and it includes nice unit tests. I also verified 'mvn clean install'.

          Before commiting it, I'll do as follows:

          • add more comments to new implemented methods
          • rename the unit test method that I pointed out
          • removed unused static variable
          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/72#issuecomment-49004222 +1 In overall, the patch looks nice to me, and it includes nice unit tests. I also verified 'mvn clean install'. Before commiting it, I'll do as follows: add more comments to new implemented methods rename the unit test method that I pointed out removed unused static variable
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/72#discussion_r14921641

          — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java —
          @@ -279,6 +280,131 @@ public final void testColumnPartitionedTableByThreeColumns() throws Exception {
          }

          @Test
          + public final void to() throws Exception {
          — End diff –

          You intended 'testInsertIntoColumnPartitionedTableByThreeColumns' because 'testInsertIntoColumnPartitionedTableByThreeColumns' is included in unit test code.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/72#discussion_r14921641 — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java — @@ -279,6 +280,131 @@ public final void testColumnPartitionedTableByThreeColumns() throws Exception { } @Test + public final void to() throws Exception { — End diff – You intended 'testInsertIntoColumnPartitionedTableByThreeColumns' because 'testInsertIntoColumnPartitionedTableByThreeColumns' is included in unit test code.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/72#discussion_r14919794

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryMasterTask.java —
          @@ -378,20 +378,7 @@ private void initStagingDir() throws IOException {

          // Create a subdirectories
          LOG.info("The staging dir '" + stagingDir + "' is created.");
          -
          queryContext.setStagingDir(stagingDir);
          -

          • /////////////////////////////////////////////////
          • // Check and Create Output Directory If Necessary
          • /////////////////////////////////////////////////
          • if (queryContext.hasOutputPath()) {
          • outputDir = queryContext.getOutputPath();
              • End diff –

          after this removal, outputDir is not used anymore.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/72#discussion_r14919794 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/QueryMasterTask.java — @@ -378,20 +378,7 @@ private void initStagingDir() throws IOException { // Create a subdirectories LOG.info("The staging dir '" + stagingDir + "' is created."); - queryContext.setStagingDir(stagingDir); - ///////////////////////////////////////////////// // Check and Create Output Directory If Necessary ///////////////////////////////////////////////// if (queryContext.hasOutputPath()) { outputDir = queryContext.getOutputPath(); End diff – after this removal, outputDir is not used anymore.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/72#discussion_r14919779

          — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionStoreExec.java —
          @@ -43,6 +51,17 @@
          protected final int [] keyIds;
          protected final String [] keyNames;

          + static final ThreadLocal<NumberFormat> OUTPUT_FILE_FORMAT_SEQ =
          — End diff –

          It seems to be not used.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/72#discussion_r14919779 — Diff: tajo-core/src/main/java/org/apache/tajo/engine/planner/physical/ColPartitionStoreExec.java — @@ -43,6 +51,17 @@ protected final int [] keyIds; protected final String [] keyNames; + static final ThreadLocal<NumberFormat> OUTPUT_FILE_FORMAT_SEQ = — End diff – It seems to be not used.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user babokim opened a pull request:

          https://github.com/apache/tajo/pull/72

          TAJO-20: INSERT INTO ... SELECT

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/babokim/tajo TAJO-20

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/72.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #72


          commit 859811075866addc65a46e368701f0f59e71bf74
          Author: 김형준 <babokim@babokim-macbook-pro.local>
          Date: 2014-06-04T09:22:07Z

          TAJO-20: INSERT INTO ... SELECT

          commit fd9397c6879603974f8e5dbe253161d14fa644aa
          Author: 김형준 <babokim@babokim-mbp.server.gruter.com>
          Date: 2014-07-14T05:45:46Z

          TAJO-20: INSERT INTO ... SELECT

          commit 5ac57239f7dc75924a10be676edf1ec938beba47
          Author: 김형준 <babokim@babokim-mbp.server.gruter.com>
          Date: 2014-07-14T07:28:03Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-20

          Conflicts:
          tajo-core/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java
          tajo-core/src/test/java/org/apache/tajo/QueryTestCaseBase.java
          tajo-core/src/test/java/org/apache/tajo/engine/query/TestCreateTable.java


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user babokim opened a pull request: https://github.com/apache/tajo/pull/72 TAJO-20 : INSERT INTO ... SELECT You can merge this pull request into a Git repository by running: $ git pull https://github.com/babokim/tajo TAJO-20 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/72.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #72 commit 859811075866addc65a46e368701f0f59e71bf74 Author: 김형준 <babokim@babokim-macbook-pro.local> Date: 2014-06-04T09:22:07Z TAJO-20 : INSERT INTO ... SELECT commit fd9397c6879603974f8e5dbe253161d14fa644aa Author: 김형준 <babokim@babokim-mbp.server.gruter.com> Date: 2014-07-14T05:45:46Z TAJO-20 : INSERT INTO ... SELECT commit 5ac57239f7dc75924a10be676edf1ec938beba47 Author: 김형준 <babokim@babokim-mbp.server.gruter.com> Date: 2014-07-14T07:28:03Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-20 Conflicts: tajo-core/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java tajo-core/src/test/java/org/apache/tajo/QueryTestCaseBase.java tajo-core/src/test/java/org/apache/tajo/engine/query/TestCreateTable.java
          Hide
          prafulla Prafulla T added a comment -

          Thanks, That works for me.
          I am little busy these days.

          Show
          prafulla Prafulla T added a comment - Thanks, That works for me. I am little busy these days.
          Hide
          hjkim Hyoungjun Kim added a comment -

          Prafulla T If you don't mind, I am going to handle this issue because I need this feature soon.

          Show
          hjkim Hyoungjun Kim added a comment - Prafulla T If you don't mind, I am going to handle this issue because I need this feature soon.
          Hide
          hjkim Hyoungjun Kim added a comment -

          In my opinion, The modification of commitOutputData() method in a Query class is more efficient.
          In the case of INSERT INTO, find largest file sequence in the target directory and move staging file to target file which replaced with a largest sequence. Partition table should be handled carefully.

          Show
          hjkim Hyoungjun Kim added a comment - In my opinion, The modification of commitOutputData() method in a Query class is more efficient. In the case of INSERT INTO, find largest file sequence in the target directory and move staging file to target file which replaced with a largest sequence. Partition table should be handled carefully.
          Hide
          prafulla Prafulla T added a comment -

          HI Jae,

          I have not yet started working on this. I am currently working on TAJO-858.
          After I am done with it , I will start working on this bug.

          Show
          prafulla Prafulla T added a comment - HI Jae, I have not yet started working on this. I am currently working on TAJO-858 . After I am done with it , I will start working on this bug.
          Hide
          goodljy Jae Young Lee added a comment -

          Hello, Prafulla.

          If this feature is enabled, it seems to be very helpful to me.
          When can I use it?

          Thanks.

          Show
          goodljy Jae Young Lee added a comment - Hello, Prafulla. If this feature is enabled, it seems to be very helpful to me. When can I use it? Thanks.
          Hide
          prafulla Prafulla T added a comment -

          Thanks for detailed explanation, Hyunsik. I will try to read more in code and if possible try to implement this functionality.

          Show
          prafulla Prafulla T added a comment - Thanks for detailed explanation, Hyunsik. I will try to read more in code and if possible try to implement this functionality.
          Hide
          hyunsik Hyunsik Choi added a comment -

          INSERT OVERWRITE INTO which removes all table data and inserts new data is already implemented in the current Tajo. So, Grammar and many parts are implemented. However, INSERT INTO statement which preserves existing data and adds new data is not implemented. This feature is necessary. It would be very nice if someone take this issue.

          As you asked, I'm going to give more description.

          Many parts are already implemented in the current Tajo. The key of this issue is to determine the file name pattern used for newly written data files and enable each task to output the determined file names. Currently, each worker writes the files as part-<execution block id>-<queryunit id>, where query unit is corresponding to Task in MR.

          Example:

          part-02-000001
          part-02-000002
          

          If possible, It would be nice if newly written file names follow the last written file name. But, this manner may require not small changes.

          We can get the last file name in GlobalEngine in TajoMaster, and we can convey the filename prefix and the last number via QueryContext object which are propagated throughout all paths of a query. As I mentioned above, each query unit generates the output filename according to the query unit id (i.e., task id). In order to follow the last number of the final written file, we need to modify the file name only if the filename prefix and last number is given.

          My description is just my idea. You can feel free to suggest your idea.

          Best regards,
          Hyunsik

          Show
          hyunsik Hyunsik Choi added a comment - INSERT OVERWRITE INTO which removes all table data and inserts new data is already implemented in the current Tajo. So, Grammar and many parts are implemented. However, INSERT INTO statement which preserves existing data and adds new data is not implemented. This feature is necessary. It would be very nice if someone take this issue. As you asked, I'm going to give more description. Many parts are already implemented in the current Tajo. The key of this issue is to determine the file name pattern used for newly written data files and enable each task to output the determined file names. Currently, each worker writes the files as part-<execution block id>-<queryunit id> , where query unit is corresponding to Task in MR. Example: part-02-000001 part-02-000002 If possible, It would be nice if newly written file names follow the last written file name. But, this manner may require not small changes. We can get the last file name in GlobalEngine in TajoMaster, and we can convey the filename prefix and the last number via QueryContext object which are propagated throughout all paths of a query. As I mentioned above, each query unit generates the output filename according to the query unit id (i.e., task id). In order to follow the last number of the final written file, we need to modify the file name only if the filename prefix and last number is given. My description is just my idea. You can feel free to suggest your idea. Best regards, Hyunsik
          Hide
          prafulla Prafulla T added a comment -

          I briefly looked at grammar and LogicalPlanner.java code. It seems that this feature is already implemented.
          If not, can someone describe issue/idea in details please?

          Show
          prafulla Prafulla T added a comment - I briefly looked at grammar and LogicalPlanner.java code. It seems that this feature is already implemented. If not, can someone describe issue/idea in details please?

            People

            • Assignee:
              hjkim Hyoungjun Kim
              Reporter:
              hyunsik Hyunsik Choi
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development