-
Type:
Bug
-
Status: Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 1.4.4
-
Fix Version/s: 1.4.5
-
Component/s: None
-
Labels:None
I've noticed at multiple user deployments that when running Sqoop in append mode (--append) it can happen that two separate jobs will end up using the same temporary directory. This is a disaster as those jobs will then start interfering with each other and possibly even cause a data loss. Currently we are using following code to generate temporary directory (AppendUtils.java):
public static Path getTempAppendDir(String tableName) { String timeId = DATE_FORM.format(new Date(System.currentTimeMillis())); String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName; return new Path(tempDir); }
There are three different parts that we are currently using to generate the temporary directory:
- TEMP_IMPORT_ROOT: Constant. It can be changed by the user if needed, but as we do not have this documented, most users are using the default constant value.
- timeId - Current time with millisecond precision.
- tableName - Name of the transferred table or null for query (--query) based import.
The problem mainly surfaces in the --query based import when 2 out of the 3 parts are constants and it can happen that two Sqoop jobs might get started at the same time.
- links to