Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-1086

Running multiple incremental sqoop jobs in parallel resets the first sqoop job's --last-value

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Won't Fix
    • 1.4.0-incubating
    • None
    • None
    • Ubuntu 12.04.2

    Description

      I've created 2 jobs (different names) that pull from the same database(MSSQL), but 2 different tables.
      They both use incremental append.

      If I run the jobs in sequence, I got no issue and the meta store for both jobs remembers the --last-value per job.

      If I run the jobs in parallel, when the 1st job finished the meta is updated with the --last-value correctly, but once the 2nd job finished the 1st job's meta --last-value is reset.

      First Job

      1. create the import job into the incremental table
        $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 1" --create "import-events" – import --connect "$ENV_TRACKING_CONNECTION" --table "$TABLE1" --split-by "dtmDBDateTime" --target-dir "$OUTPUT1" --incremental append --check-column "dtmDBDateTime" --last-value "2012-01-01 00:00:00.000" --fields-terminated-by
        t --null-string '' --null-non-string '';

      Second Job

      1. create the import job into the table
        $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 2" --create "import-impressions" – import --connect "$ENV_TRACKING_CONNECTION" --table "$TABLE2" --split-by "dtmDBDateTime" --target-dir "$OUTPUT2" --incremental append --check-column "dtmDBDateTime" --last-value "2012-01-01 00:00:00.000" --fields-terminated-by
        t --null-string '' --null-non-string '';

      Attachments

        Activity

          People

            BoglarkaEgyed Boglarka Egyed
            byron Byron Foster
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: