Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1067

INSERT OVERWRITE INTO should not remove all partitions.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: QueryMaster
    • Labels:
      None

      Description

      Currently, INSERT OVERWRITE INTO always moves the result data into the original table location. As a result, all existing partitions have been removed. The query should not remove all partitions because existing partitions may be a dataset for a production cluster.

        Issue Links

          Activity

          Hide
          blrunner Jaehwa Jung added a comment -

          I found this issue that is depended upon by ALTER TABLE ADD/DROP PARTITION statement. I implemented the patch for this issue by checking existing partition directories. As a result, I could insert overwrite the result data without removing existing another partition directories. But the patch couldn't merge existing row numbers. I expect that I can resolve by ALTER TABLE ADD/DROP PARTITION statement.

          Show
          blrunner Jaehwa Jung added a comment - I found this issue that is depended upon by ALTER TABLE ADD/DROP PARTITION statement. I implemented the patch for this issue by checking existing partition directories. As a result, I could insert overwrite the result data without removing existing another partition directories. But the patch couldn't merge existing row numbers. I expect that I can resolve by ALTER TABLE ADD/DROP PARTITION statement.
          Hide
          githubbot ASF GitHub Bot added a comment -

          GitHub user blrunner opened a pull request:

          https://github.com/apache/tajo/pull/161

          TAJO-1067: INSERT OVERWRITE INTO should not remove all partitions.

          I added new session variable for avoiding remove existing partition directories. Its name is COLUMN_PARITION_REMOVE_ALL_PARTITIONS. Its default value is false. Thus, if you run INSERT OVERWRITE statement with column partitioned table, tajo doesn't remove existing partition directories. Tajo just remove partition directories which is equals to staging directories.

          And this patch has a lack that is related to TAJO-744. You left the comment for the lack at TAJO-1067.

          You can merge this pull request into a Git repository by running:

          $ git pull https://github.com/blrunner/tajo TAJO-1067

          Alternatively you can review and apply these changes as the patch at:

          https://github.com/apache/tajo/pull/161.patch

          To close this pull request, make a commit to your master/trunk branch
          with (at least) the following in the commit message:

          This closes #161


          commit 7c98709f0fcb06dfb675acae3d6489a6126f55b5
          Author: jinossy <jinossy@gmail.com>
          Date: 2014-08-06T08:43:35Z

          TAJO-995: HiveMetaStoreClient wrapper should retry the connection

          commit 415d0867ae4a4543f47360294bead1fc7f41e292
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-08-10T06:07:24Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit 7a7b4fd26f61df89cacdb4fc41faf9c2abe456b2
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-08-11T02:28:48Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit 45f5ed3adba931f4706f26dda1d3c03240ee11d3
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-08-11T05:40:25Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit aa01e83859ef553ac4eb90c1678e3bc6be20c6c9
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-08-18T09:56:24Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit b33a94509c1a007b56785435c8e16640ffde91b7
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-09-04T02:14:19Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit 05c892448113db40daef54d2e06dad463dbae9c8
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-09-11T02:33:54Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit 42a6c4ebeebe36aad6f7dc5f92c83baee398c85e
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-09-11T03:25:05Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit 52a942136c549f197d6d1c3d1a13717e6f14a83f
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-09-24T01:26:36Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

          commit 97910caa09353eb395205b35b9417b2ae51d076a
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-09-25T17:51:13Z

          TAJO-1067: INSERT OVERWRITE INTO should not remove all partitions.

          commit 0a7611d6aea1a98477960be38a5045abfdc78ae0
          Author: Jaehwa Jung <blrunner@apache.org>
          Date: 2014-09-25T17:52:07Z

          Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1067


          Show
          githubbot ASF GitHub Bot added a comment - GitHub user blrunner opened a pull request: https://github.com/apache/tajo/pull/161 TAJO-1067 : INSERT OVERWRITE INTO should not remove all partitions. I added new session variable for avoiding remove existing partition directories. Its name is COLUMN_PARITION_REMOVE_ALL_PARTITIONS. Its default value is false. Thus, if you run INSERT OVERWRITE statement with column partitioned table, tajo doesn't remove existing partition directories. Tajo just remove partition directories which is equals to staging directories. And this patch has a lack that is related to TAJO-744 . You left the comment for the lack at TAJO-1067 . You can merge this pull request into a Git repository by running: $ git pull https://github.com/blrunner/tajo TAJO-1067 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/161.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #161 commit 7c98709f0fcb06dfb675acae3d6489a6126f55b5 Author: jinossy <jinossy@gmail.com> Date: 2014-08-06T08:43:35Z TAJO-995 : HiveMetaStoreClient wrapper should retry the connection commit 415d0867ae4a4543f47360294bead1fc7f41e292 Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-08-10T06:07:24Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit 7a7b4fd26f61df89cacdb4fc41faf9c2abe456b2 Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-08-11T02:28:48Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit 45f5ed3adba931f4706f26dda1d3c03240ee11d3 Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-08-11T05:40:25Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit aa01e83859ef553ac4eb90c1678e3bc6be20c6c9 Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-08-18T09:56:24Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit b33a94509c1a007b56785435c8e16640ffde91b7 Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-09-04T02:14:19Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit 05c892448113db40daef54d2e06dad463dbae9c8 Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-09-11T02:33:54Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit 42a6c4ebeebe36aad6f7dc5f92c83baee398c85e Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-09-11T03:25:05Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit 52a942136c549f197d6d1c3d1a13717e6f14a83f Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-09-24T01:26:36Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo commit 97910caa09353eb395205b35b9417b2ae51d076a Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-09-25T17:51:13Z TAJO-1067 : INSERT OVERWRITE INTO should not remove all partitions. commit 0a7611d6aea1a98477960be38a5045abfdc78ae0 Author: Jaehwa Jung <blrunner@apache.org> Date: 2014-09-25T17:52:07Z Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into TAJO-1067
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/161#issuecomment-57007572

          I think that this feature should not depend on session variable. Session variable works implicitly. Users can miss to check if the session variable is enabled.

          As a result, inadvertently, a user can lost his or her entire table by ```INSERT OVERWRITE``` clause.

          I think that we should force users to manually use ```TRUNCATE``` statement instead of using session variable.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/161#issuecomment-57007572 I think that this feature should not depend on session variable. Session variable works implicitly. Users can miss to check if the session variable is enabled. As a result, inadvertently, a user can lost his or her entire table by ```INSERT OVERWRITE``` clause. I think that we should force users to manually use ```TRUNCATE``` statement instead of using session variable.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner commented on the pull request:

          https://github.com/apache/tajo/pull/161#issuecomment-57045168

          Hi @hyunsik

          I agree with your opinion. I've just removed the variable.
          If possible, could you check the patch again?

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/161#issuecomment-57045168 Hi @hyunsik I agree with your opinion. I've just removed the variable. If possible, could you check the patch again?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18317268

          — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java —
          @@ -485,6 +483,45 @@ public final void testInsertIntoColumnPartitionedTableByThreeColumns() throws Ex
          "R,3,3,49.0\n" +
          "R,3,3,49.0\n";
          assertEquals(expected, resultSetData);
          +
          + // insert overwrite into already exists partitioned table with COLUMN_PARITION_REMOVE_ALL_PARTITIONS is false
          — End diff –

          COLUMN_PARITION_REMOVE_ALL_PARTITIONS was already removed. So, you need to revise this comment line.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18317268 — Diff: tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java — @@ -485,6 +483,45 @@ public final void testInsertIntoColumnPartitionedTableByThreeColumns() throws Ex "R,3,3,49.0\n" + "R,3,3,49.0\n"; assertEquals(expected, resultSetData); + + // insert overwrite into already exists partitioned table with COLUMN_PARITION_REMOVE_ALL_PARTITIONS is false — End diff – COLUMN_PARITION_REMOVE_ALL_PARTITIONS was already removed. So, you need to revise this comment line.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18317818

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) {
          boolean movedToOldTable = false;
          boolean committed = false;
          Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME);

          • try {
          • if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - }

            else { // if the parent does not exist, make its parent directory.

          • fs.mkdirs(finalOutputDir.getParent());
            +
            + // INSERT OVERWRITE INTO always moves the result data into the original table location.
            + // As a result, all existing partitions have been removed. The query should not remove all partitions
            + // because existing partitions may be a data-set for a production cluster.
            + if (queryContext.hasPartition()) {
            + Map<Path, Path> renameDirs = TUtil.newHashMap();
            + Map<Path, Path> recoveryDirs = TUtil.newHashMap();
            +
            + try {
            + if (!fs.exists(finalOutputDir)) { + fs.mkdirs(finalOutputDir); + }

            +
            + visitPartitionedDirectory(fs, stagingResultDir, finalOutputDir, stagingResultDir.toString(),
            + renameDirs, oldTableDir);
            +
            + // Rename target partition directories
            + for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) {
            + // Backup existing data files for recovering
            + if (fs.exists(entry.getValue()))

            { + String recoveryPathString = entry.getValue().toString().replaceAll(finalOutputDir.toString(), + oldTableDir.toString()); + Path recoveryPath = new Path(recoveryPathString); + fs.rename(entry.getValue(), recoveryPath); + fs.exists(recoveryPath); + recoveryDirs.put(entry.getValue(), recoveryPath); + }

            + // Delete existing directory
            + fs.deleteOnExit(entry.getValue());

              • End diff –

          Could you check the use of FileSystem::deleteOnExit? You seem to misuse it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18317818 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) { boolean movedToOldTable = false; boolean committed = false; Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME); try { if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - } else { // if the parent does not exist, make its parent directory. fs.mkdirs(finalOutputDir.getParent()); + + // INSERT OVERWRITE INTO always moves the result data into the original table location. + // As a result, all existing partitions have been removed. The query should not remove all partitions + // because existing partitions may be a data-set for a production cluster. + if (queryContext.hasPartition()) { + Map<Path, Path> renameDirs = TUtil.newHashMap(); + Map<Path, Path> recoveryDirs = TUtil.newHashMap(); + + try { + if (!fs.exists(finalOutputDir)) { + fs.mkdirs(finalOutputDir); + } + + visitPartitionedDirectory(fs, stagingResultDir, finalOutputDir, stagingResultDir.toString(), + renameDirs, oldTableDir); + + // Rename target partition directories + for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) { + // Backup existing data files for recovering + if (fs.exists(entry.getValue())) { + String recoveryPathString = entry.getValue().toString().replaceAll(finalOutputDir.toString(), + oldTableDir.toString()); + Path recoveryPath = new Path(recoveryPathString); + fs.rename(entry.getValue(), recoveryPath); + fs.exists(recoveryPath); + recoveryDirs.put(entry.getValue(), recoveryPath); + } + // Delete existing directory + fs.deleteOnExit(entry.getValue()); End diff – Could you check the use of FileSystem::deleteOnExit? You seem to misuse it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18317824

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) {
          boolean movedToOldTable = false;
          boolean committed = false;
          Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME);

          • try {
          • if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - }

            else { // if the parent does not exist, make its parent directory.

          • fs.mkdirs(finalOutputDir.getParent());
            +
            + // INSERT OVERWRITE INTO always moves the result data into the original table location.
            + // As a result, all existing partitions have been removed. The query should not remove all partitions
            + // because existing partitions may be a data-set for a production cluster.
            + if (queryContext.hasPartition()) {
            + Map<Path, Path> renameDirs = TUtil.newHashMap();
            + Map<Path, Path> recoveryDirs = TUtil.newHashMap();
            +
            + try {
            + if (!fs.exists(finalOutputDir)) { + fs.mkdirs(finalOutputDir); + }

            +
            + visitPartitionedDirectory(fs, stagingResultDir, finalOutputDir, stagingResultDir.toString(),
            + renameDirs, oldTableDir);
            +
            + // Rename target partition directories
            + for(Map.Entry<Path, Path> entry : renameDirs.entrySet())

            Unknown macro: { + // Backup existing data files for recovering + if (fs.exists(entry.getValue())) { + String recoveryPathString = entry.getValue().toString().replaceAll(finalOutputDir.toString(), + oldTableDir.toString()); + Path recoveryPath = new Path(recoveryPathString); + fs.rename(entry.getValue(), recoveryPath); + fs.exists(recoveryPath); + recoveryDirs.put(entry.getValue(), recoveryPath); + } + // Delete existing directory + fs.deleteOnExit(entry.getValue()); + // Rename staging directory to final output directory + fs.rename(entry.getKey(), entry.getValue()); + }

            +
            + } catch (IOException ioe) {
            + // Remove created dirs
            + for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) {
            + fs.deleteOnExit(entry.getValue());

              • End diff –

          Could you check the use of FileSystem::deleteOnExit? You seem to misuse it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18317824 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) { boolean movedToOldTable = false; boolean committed = false; Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME); try { if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - } else { // if the parent does not exist, make its parent directory. fs.mkdirs(finalOutputDir.getParent()); + + // INSERT OVERWRITE INTO always moves the result data into the original table location. + // As a result, all existing partitions have been removed. The query should not remove all partitions + // because existing partitions may be a data-set for a production cluster. + if (queryContext.hasPartition()) { + Map<Path, Path> renameDirs = TUtil.newHashMap(); + Map<Path, Path> recoveryDirs = TUtil.newHashMap(); + + try { + if (!fs.exists(finalOutputDir)) { + fs.mkdirs(finalOutputDir); + } + + visitPartitionedDirectory(fs, stagingResultDir, finalOutputDir, stagingResultDir.toString(), + renameDirs, oldTableDir); + + // Rename target partition directories + for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) Unknown macro: { + // Backup existing data files for recovering + if (fs.exists(entry.getValue())) { + String recoveryPathString = entry.getValue().toString().replaceAll(finalOutputDir.toString(), + oldTableDir.toString()); + Path recoveryPath = new Path(recoveryPathString); + fs.rename(entry.getValue(), recoveryPath); + fs.exists(recoveryPath); + recoveryDirs.put(entry.getValue(), recoveryPath); + } + // Delete existing directory + fs.deleteOnExit(entry.getValue()); + // Rename staging directory to final output directory + fs.rename(entry.getKey(), entry.getValue()); + } + + } catch (IOException ioe) { + // Remove created dirs + for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) { + fs.deleteOnExit(entry.getValue()); End diff – Could you check the use of FileSystem::deleteOnExit? You seem to misuse it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18317839

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) {
          boolean movedToOldTable = false;
          boolean committed = false;
          Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME);

          • try {
          • if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - }

            else { // if the parent does not exist, make its parent directory.

          • fs.mkdirs(finalOutputDir.getParent());
            +
            + // INSERT OVERWRITE INTO always moves the result data into the original table location.
            + // As a result, all existing partitions have been removed. The query should not remove all partitions
            + // because existing partitions may be a data-set for a production cluster.
            + if (queryContext.hasPartition()) {
            + Map<Path, Path> renameDirs = TUtil.newHashMap();
            + Map<Path, Path> recoveryDirs = TUtil.newHashMap();
            +
            + try {
            + if (!fs.exists(finalOutputDir)) { + fs.mkdirs(finalOutputDir); + }

            +
            + visitPartitionedDirectory(fs, stagingResultDir, finalOutputDir, stagingResultDir.toString(),
            + renameDirs, oldTableDir);
            +
            + // Rename target partition directories
            + for(Map.Entry<Path, Path> entry : renameDirs.entrySet())

            Unknown macro: { + // Backup existing data files for recovering + if (fs.exists(entry.getValue())) { + String recoveryPathString = entry.getValue().toString().replaceAll(finalOutputDir.toString(), + oldTableDir.toString()); + Path recoveryPath = new Path(recoveryPathString); + fs.rename(entry.getValue(), recoveryPath); + fs.exists(recoveryPath); + recoveryDirs.put(entry.getValue(), recoveryPath); + } + // Delete existing directory + fs.deleteOnExit(entry.getValue()); + // Rename staging directory to final output directory + fs.rename(entry.getKey(), entry.getValue()); + }

            +
            + } catch (IOException ioe) {
            + // Remove created dirs
            + for(Map.Entry<Path, Path> entry : renameDirs.entrySet())

            { + fs.deleteOnExit(entry.getValue()); + }

            +
            + // Recovery renamed dirs
            + for(Map.Entry<Path, Path> entry : recoveryDirs.entrySet()) {
            + fs.deleteOnExit(entry.getValue());

              • End diff –

          Could you check the use of FileSystem::deleteOnExit? You seem to misuse it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18317839 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) { boolean movedToOldTable = false; boolean committed = false; Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME); try { if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - } else { // if the parent does not exist, make its parent directory. fs.mkdirs(finalOutputDir.getParent()); + + // INSERT OVERWRITE INTO always moves the result data into the original table location. + // As a result, all existing partitions have been removed. The query should not remove all partitions + // because existing partitions may be a data-set for a production cluster. + if (queryContext.hasPartition()) { + Map<Path, Path> renameDirs = TUtil.newHashMap(); + Map<Path, Path> recoveryDirs = TUtil.newHashMap(); + + try { + if (!fs.exists(finalOutputDir)) { + fs.mkdirs(finalOutputDir); + } + + visitPartitionedDirectory(fs, stagingResultDir, finalOutputDir, stagingResultDir.toString(), + renameDirs, oldTableDir); + + // Rename target partition directories + for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) Unknown macro: { + // Backup existing data files for recovering + if (fs.exists(entry.getValue())) { + String recoveryPathString = entry.getValue().toString().replaceAll(finalOutputDir.toString(), + oldTableDir.toString()); + Path recoveryPath = new Path(recoveryPathString); + fs.rename(entry.getValue(), recoveryPath); + fs.exists(recoveryPath); + recoveryDirs.put(entry.getValue(), recoveryPath); + } + // Delete existing directory + fs.deleteOnExit(entry.getValue()); + // Rename staging directory to final output directory + fs.rename(entry.getKey(), entry.getValue()); + } + + } catch (IOException ioe) { + // Remove created dirs + for(Map.Entry<Path, Path> entry : renameDirs.entrySet()) { + fs.deleteOnExit(entry.getValue()); + } + + // Recovery renamed dirs + for(Map.Entry<Path, Path> entry : recoveryDirs.entrySet()) { + fs.deleteOnExit(entry.getValue()); End diff – Could you check the use of FileSystem::deleteOnExit? You seem to misuse it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18317858

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -494,6 +543,65 @@ public Path commitOutputData(Query query)

          { return finalOutputDir; }

          + /**
          + * This method rename staging directory to final output directory recursively.
          — End diff –

          s/rename/renames/

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18317858 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -494,6 +543,65 @@ public Path commitOutputData(Query query) { return finalOutputDir; } + /** + * This method rename staging directory to final output directory recursively. — End diff – s/rename/renames/
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18318014

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) {
          boolean movedToOldTable = false;
          boolean committed = false;
          Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME);

          • try {
          • if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - }

            else { // if the parent does not exist, make its parent directory.

          • fs.mkdirs(finalOutputDir.getParent());
            +
            + // INSERT OVERWRITE INTO always moves the result data into the original table location.
            + // As a result, all existing partitions have been removed. The query should not remove all partitions
              • End diff –

          I think that you don't need to the story about the old implementation. It would be better if you explain what the below code does.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18318014 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -432,19 +432,68 @@ public Path commitOutputData(Query query) { boolean movedToOldTable = false; boolean committed = false; Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME); try { if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - } else { // if the parent does not exist, make its parent directory. fs.mkdirs(finalOutputDir.getParent()); + + // INSERT OVERWRITE INTO always moves the result data into the original table location. + // As a result, all existing partitions have been removed. The query should not remove all partitions End diff – I think that you don't need to the story about the old implementation. It would be better if you explain what the below code does.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner commented on the pull request:

          https://github.com/apache/tajo/pull/161#issuecomment-57599751

          Hi @hyunsik

          Thank you for your detailed review.
          I agree with your opinions and I've just updated it.
          Could you check it again?

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/161#issuecomment-57599751 Hi @hyunsik Thank you for your detailed review. I agree with your opinions and I've just updated it. Could you check it again?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18433083

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -486,6 +532,65 @@ public Path commitOutputData(Query query)

          { return finalOutputDir; }

          + /**
          + * This method sets a a rename map which includes renamed staging directory to final output directory recursively.
          — End diff –

          'a' is duplicated.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18433083 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -486,6 +532,65 @@ public Path commitOutputData(Query query) { return finalOutputDir; } + /** + * This method sets a a rename map which includes renamed staging directory to final output directory recursively. — End diff – 'a' is duplicated.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18433100

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -424,19 +424,65 @@ public Path commitOutputData(Query query) {
          boolean movedToOldTable = false;
          boolean committed = false;
          Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME);

          • try {
          • if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - }

            else { // if the parent does not exist, make its parent directory.

          • fs.mkdirs(finalOutputDir.getParent());
            +
            + if (queryContext.hasPartition()) {
            + Map<Path, Path> renameDirs = TUtil.newHashMap();
              • End diff –

          Could you add the description about key and value of Map?

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18433100 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -424,19 +424,65 @@ public Path commitOutputData(Query query) { boolean movedToOldTable = false; boolean committed = false; Path oldTableDir = new Path(queryContext.getStagingDir(), TajoConstants.INSERT_OVERWIRTE_OLD_TABLE_NAME); try { if (fs.exists(finalOutputDir)) { - fs.rename(finalOutputDir, oldTableDir); - movedToOldTable = fs.exists(oldTableDir); - } else { // if the parent does not exist, make its parent directory. fs.mkdirs(finalOutputDir.getParent()); + + if (queryContext.hasPartition()) { + Map<Path, Path> renameDirs = TUtil.newHashMap(); End diff – Could you add the description about key and value of Map?
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on a diff in the pull request:

          https://github.com/apache/tajo/pull/161#discussion_r18433154

          — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java —
          @@ -486,6 +532,65 @@ public Path commitOutputData(Query query)

          { return finalOutputDir; }

          + /**
          + * This method sets a a rename map which includes renamed staging directory to final output directory recursively.
          + * If there exists some data files, this delete it for duplicate data.
          + *
          + *
          + * @param fs
          + * @param stagingPath
          + * @param outputPath
          + * @param stagingParentPathString
          + * @throws IOException
          + */
          + private void visitPartitionedDirectory(FileSystem fs, Path stagingPath, Path outputPath,
          + String stagingParentPathString,
          + Map<Path, Path> renameDirs, Path oldTableDir) throws IOException {
          + FileStatus[] files = fs.listStatus(stagingPath);
          +
          + for(FileStatus eachFile : files) {
          + if (eachFile.isDirectory()) {
          + Path oldPath = eachFile.getPath();
          +
          + // Make recover directory.
          + String recoverPathString = oldPath.toString().replaceAll(stagingParentPathString,
          + oldTableDir.toString());
          + Path recoveryPath = new Path(recoverPathString);
          + if (!fs.exists(recoveryPath))

          { + fs.mkdirs(recoveryPath); + }

          +
          + visitPartitionedDirectory(fs, eachFile.getPath(), outputPath, stagingParentPathString,
          + renameDirs, oldTableDir);
          + // Find last order partition for renaming
          + String newPathString = oldPath.toString().replaceAll(stagingParentPathString,
          + outputPath.toString());
          + Path newPath = new Path(newPathString);
          + if (!hasDirectory(fs, eachFile.getPath())) {
          — End diff –

          In my view, ```isLeafDirectory()``` would be more intuitive name than ```!hasDirectory()```.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/161#discussion_r18433154 — Diff: tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java — @@ -486,6 +532,65 @@ public Path commitOutputData(Query query) { return finalOutputDir; } + /** + * This method sets a a rename map which includes renamed staging directory to final output directory recursively. + * If there exists some data files, this delete it for duplicate data. + * + * + * @param fs + * @param stagingPath + * @param outputPath + * @param stagingParentPathString + * @throws IOException + */ + private void visitPartitionedDirectory(FileSystem fs, Path stagingPath, Path outputPath, + String stagingParentPathString, + Map<Path, Path> renameDirs, Path oldTableDir) throws IOException { + FileStatus[] files = fs.listStatus(stagingPath); + + for(FileStatus eachFile : files) { + if (eachFile.isDirectory()) { + Path oldPath = eachFile.getPath(); + + // Make recover directory. + String recoverPathString = oldPath.toString().replaceAll(stagingParentPathString, + oldTableDir.toString()); + Path recoveryPath = new Path(recoverPathString); + if (!fs.exists(recoveryPath)) { + fs.mkdirs(recoveryPath); + } + + visitPartitionedDirectory(fs, eachFile.getPath(), outputPath, stagingParentPathString, + renameDirs, oldTableDir); + // Find last order partition for renaming + String newPathString = oldPath.toString().replaceAll(stagingParentPathString, + outputPath.toString()); + Path newPath = new Path(newPathString); + if (!hasDirectory(fs, eachFile.getPath())) { — End diff – In my view, ```isLeafDirectory()``` would be more intuitive name than ```!hasDirectory()```.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/161#issuecomment-57925837

          Thank you for your work. The patch looks good to me. After I hear your through about my final comments, I'll finish this review.

          • Hyunsik
          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/161#issuecomment-57925837 Thank you for your work. The patch looks good to me. After I hear your through about my final comments, I'll finish this review. Hyunsik
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner commented on the pull request:

          https://github.com/apache/tajo/pull/161#issuecomment-57935420

          Hi @hyunsik

          Thank you for your detailed review. I've just updated the patch.

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/161#issuecomment-57935420 Hi @hyunsik Thank you for your detailed review. I've just updated the patch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user hyunsik commented on the pull request:

          https://github.com/apache/tajo/pull/161#issuecomment-57945491

          +1 Ship it.

          Show
          githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/161#issuecomment-57945491 +1 Ship it.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner commented on the pull request:

          https://github.com/apache/tajo/pull/161#issuecomment-57975615

          Hi @hyunsik

          Thank you for your detailed review.
          I've just committed the patch to the master branch.

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/161#issuecomment-57975615 Hi @hyunsik Thank you for your detailed review. I've just committed the patch to the master branch.
          Hide
          githubbot ASF GitHub Bot added a comment -

          Github user blrunner closed the pull request at:

          https://github.com/apache/tajo/pull/161

          Show
          githubbot ASF GitHub Bot added a comment - Github user blrunner closed the pull request at: https://github.com/apache/tajo/pull/161
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-build #391 (See https://builds.apache.org/job/Tajo-master-build/391/)
          TAJO-1067: INSERT OVERWRITE INTO should not remove all partitions. (jaehwa) (blrunner: rev ca5fb301bff4b38d80a523d5bece9eaf74f64ec3)

          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
          • tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java
          • CHANGES
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #391 (See https://builds.apache.org/job/Tajo-master-build/391/ ) TAJO-1067 : INSERT OVERWRITE INTO should not remove all partitions. (jaehwa) (blrunner: rev ca5fb301bff4b38d80a523d5bece9eaf74f64ec3) tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java CHANGES tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-master-CODEGEN-build #33 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/33/)
          TAJO-1067: INSERT OVERWRITE INTO should not remove all partitions. (jaehwa) (blrunner: rev ca5fb301bff4b38d80a523d5bece9eaf74f64ec3)

          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
          • CHANGES
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          • tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-CODEGEN-build #33 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/33/ ) TAJO-1067 : INSERT OVERWRITE INTO should not remove all partitions. (jaehwa) (blrunner: rev ca5fb301bff4b38d80a523d5bece9eaf74f64ec3) tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java CHANGES tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in Tajo-block_iteration-branch-build #15 (See https://builds.apache.org/job/Tajo-block_iteration-branch-build/15/)
          TAJO-1067: INSERT OVERWRITE INTO should not remove all partitions. (jaehwa) (blrunner: rev ca5fb301bff4b38d80a523d5bece9eaf74f64ec3)

          • CHANGES
          • tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java
          • tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java
          • tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in Tajo-block_iteration-branch-build #15 (See https://builds.apache.org/job/Tajo-block_iteration-branch-build/15/ ) TAJO-1067 : INSERT OVERWRITE INTO should not remove all partitions. (jaehwa) (blrunner: rev ca5fb301bff4b38d80a523d5bece9eaf74f64ec3) CHANGES tajo-core/src/test/java/org/apache/tajo/engine/query/TestTablePartitions.java tajo-core/src/main/java/org/apache/tajo/master/querymaster/Query.java tajo-common/src/main/java/org/apache/tajo/conf/TajoConf.java

            People

            • Assignee:
              blrunner Jaehwa Jung
              Reporter:
              blrunner Jaehwa Jung
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development