Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-1273

Multiple append jobs can easily end up sharing directories


    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.4
    • Fix Version/s: 1.4.5
    • Component/s: None
    • Labels:


      I've noticed at multiple user deployments that when running Sqoop in append mode (--append) it can happen that two separate jobs will end up using the same temporary directory. This is a disaster as those jobs will then start interfering with each other and possibly even cause a data loss. Currently we are using following code to generate temporary directory (AppendUtils.java):

        public static Path getTempAppendDir(String tableName) {
          String timeId = DATE_FORM.format(new Date(System.currentTimeMillis()));
          String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName;
          return new Path(tempDir);

      There are three different parts that we are currently using to generate the temporary directory:

      • TEMP_IMPORT_ROOT: Constant. It can be changed by the user if needed, but as we do not have this documented, most users are using the default constant value.
      • timeId - Current time with millisecond precision.
      • tableName - Name of the transferred table or null for query (--query) based import.

      The problem mainly surfaces in the --query based import when 2 out of the 3 parts are constants and it can happen that two Sqoop jobs might get started at the same time.


        1. SQOOP-1273.patch
          5 kB
          Jarek Jarcec Cecho

          Issue Links



              • Assignee:
                jarcec Jarek Jarcec Cecho
                jarcec Jarek Jarcec Cecho
              • Votes:
                0 Vote for this issue
                4 Start watching this issue


                • Created: