Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.4.4
-
None
-
None
Description
I've noticed at multiple user deployments that when running Sqoop in append mode (--append) it can happen that two separate jobs will end up using the same temporary directory. This is a disaster as those jobs will then start interfering with each other and possibly even cause a data loss. Currently we are using following code to generate temporary directory (AppendUtils.java):
public static Path getTempAppendDir(String tableName) { String timeId = DATE_FORM.format(new Date(System.currentTimeMillis())); String tempDir = TEMP_IMPORT_ROOT + Path.SEPARATOR + timeId + tableName; return new Path(tempDir); }
There are three different parts that we are currently using to generate the temporary directory:
- TEMP_IMPORT_ROOT: Constant. It can be changed by the user if needed, but as we do not have this documented, most users are using the default constant value.
- timeId - Current time with millisecond precision.
- tableName - Name of the transferred table or null for query (--query) based import.
The problem mainly surfaces in the --query based import when 2 out of the 3 parts are constants and it can happen that two Sqoop jobs might get started at the same time.
Attachments
Attachments
Issue Links
- links to