Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1273

Multiple append jobs can easily end up sharing directories

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.4
    • 1.4.5
    • None
    • None

    Description

      I've noticed at multiple user deployments that when running Sqoop in append mode (--append) it can happen that two separate jobs will end up using the same temporary directory. This is a disaster as those jobs will then start interfering with each other and possibly even cause a data loss. Currently we are using following code to generate temporary directory (AppendUtils.java):

        public static Path getTempAppendDir(String tableName) {
          String timeId = DATE_FORM.format(new Date(System.currentTimeMillis()));
          String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName;
          return new Path(tempDir);
        }
      

      There are three different parts that we are currently using to generate the temporary directory:

      • TEMP_IMPORT_ROOT: Constant. It can be changed by the user if needed, but as we do not have this documented, most users are using the default constant value.
      • timeId - Current time with millisecond precision.
      • tableName - Name of the transferred table or null for query (--query) based import.

      The problem mainly surfaces in the --query based import when 2 out of the 3 parts are constants and it can happen that two Sqoop jobs might get started at the same time.

      Attachments

        1. SQOOP-1273.patch
          5 kB
          Jarek Jarcec Cecho

        Issue Links

          Activity

            People

              jarcec Jarek Jarcec Cecho
              jarcec Jarek Jarcec Cecho
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: