Sqoop
  1. Sqoop
  2. SQOOP-1273

Multiple append jobs can easily end up sharing directories

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.4
    • Fix Version/s: 1.4.5
    • Component/s: None
    • Labels:
      None

      Description

      I've noticed at multiple user deployments that when running Sqoop in append mode (--append) it can happen that two separate jobs will end up using the same temporary directory. This is a disaster as those jobs will then start interfering with each other and possibly even cause a data loss. Currently we are using following code to generate temporary directory (AppendUtils.java):

        public static Path getTempAppendDir(String tableName) {
          String timeId = DATE_FORM.format(new Date(System.currentTimeMillis()));
          String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName;
          return new Path(tempDir);
        }
      

      There are three different parts that we are currently using to generate the temporary directory:

      • TEMP_IMPORT_ROOT: Constant. It can be changed by the user if needed, but as we do not have this documented, most users are using the default constant value.
      • timeId - Current time with millisecond precision.
      • tableName - Name of the transferred table or null for query (--query) based import.

      The problem mainly surfaces in the --query based import when 2 out of the 3 parts are constants and it can happen that two Sqoop jobs might get started at the same time.

      1. SQOOP-1273.patch
        5 kB
        Jarek Jarcec Cecho

        Issue Links

          Activity

          Hide
          Venkat Ranganathan added a comment - - edited

          I thought I reported and submitted a patch for this - may be I missed. Sorry about that.
          The fix I did was to add the current process id.

          Show
          Venkat Ranganathan added a comment - - edited I thought I reported and submitted a patch for this - may be I missed. Sorry about that. The fix I did was to add the current process id.
          Hide
          Jarek Jarcec Cecho added a comment -

          Thank you for your input Venkat Ranganathan, greatly appreciated. I've incorporated your idea into the patch.

          Show
          Jarek Jarcec Cecho added a comment - Thank you for your input Venkat Ranganathan , greatly appreciated. I've incorporated your idea into the patch.
          Hide
          ASF subversion and git services added a comment -

          Commit ad12695b59e7f0af09e27da8dca08e9a2be9b6a2 in branch refs/heads/trunk from Venkat Ranganathan
          [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=ad12695 ]

          SQOOP-1273: Multiple append jobs can easily end up sharing directories
          (Jarek Jarcec Cecho via Venkat Ranganathan)

          Show
          ASF subversion and git services added a comment - Commit ad12695b59e7f0af09e27da8dca08e9a2be9b6a2 in branch refs/heads/trunk from Venkat Ranganathan [ https://git-wip-us.apache.org/repos/asf?p=sqoop.git;h=ad12695 ] SQOOP-1273 : Multiple append jobs can easily end up sharing directories (Jarek Jarcec Cecho via Venkat Ranganathan)
          Hide
          Venkat Ranganathan added a comment -

          Thanks for your contribution Jarek Jarcec Cecho.

          Show
          Venkat Ranganathan added a comment - Thanks for your contribution Jarek Jarcec Cecho .
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop100 #838 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/838/)
          SQOOP-1273: Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2)

          • src/java/org/apache/sqoop/tool/ImportTool.java
          • src/test/com/cloudera/sqoop/TestAppendUtils.java
          • src/java/org/apache/sqoop/util/AppendUtils.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop100 #838 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop100/838/ ) SQOOP-1273 : Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2 ) src/java/org/apache/sqoop/tool/ImportTool.java src/test/com/cloudera/sqoop/TestAppendUtils.java src/java/org/apache/sqoop/util/AppendUtils.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #879 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/879/)
          SQOOP-1273: Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2)

          • src/java/org/apache/sqoop/util/AppendUtils.java
          • src/java/org/apache/sqoop/tool/ImportTool.java
          • src/test/com/cloudera/sqoop/TestAppendUtils.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop200 #879 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop200/879/ ) SQOOP-1273 : Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2 ) src/java/org/apache/sqoop/util/AppendUtils.java src/java/org/apache/sqoop/tool/ImportTool.java src/test/com/cloudera/sqoop/TestAppendUtils.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop20 #873 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/873/)
          SQOOP-1273: Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2)

          • src/test/com/cloudera/sqoop/TestAppendUtils.java
          • src/java/org/apache/sqoop/tool/ImportTool.java
          • src/java/org/apache/sqoop/util/AppendUtils.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop20 #873 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop20/873/ ) SQOOP-1273 : Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2 ) src/test/com/cloudera/sqoop/TestAppendUtils.java src/java/org/apache/sqoop/tool/ImportTool.java src/java/org/apache/sqoop/util/AppendUtils.java
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop23 #1075 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/1075/)
          SQOOP-1273: Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2)

          • src/java/org/apache/sqoop/tool/ImportTool.java
          • src/java/org/apache/sqoop/util/AppendUtils.java
          • src/test/com/cloudera/sqoop/TestAppendUtils.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Sqoop-ant-jdk-1.6-hadoop23 #1075 (See https://builds.apache.org/job/Sqoop-ant-jdk-1.6-hadoop23/1075/ ) SQOOP-1273 : Multiple append jobs can easily end up sharing directories (venkat: https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=ad12695b59e7f0af09e27da8dca08e9a2be9b6a2 ) src/java/org/apache/sqoop/tool/ImportTool.java src/java/org/apache/sqoop/util/AppendUtils.java src/test/com/cloudera/sqoop/TestAppendUtils.java

            People

            • Assignee:
              Jarek Jarcec Cecho
              Reporter:
              Jarek Jarcec Cecho
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development