Affects Version/s: 1.4.4
Fix Version/s: 1.4.5
I've noticed at multiple user deployments that when running Sqoop in append mode (--append) it can happen that two separate jobs will end up using the same temporary directory. This is a disaster as those jobs will then start interfering with each other and possibly even cause a data loss. Currently we are using following code to generate temporary directory (AppendUtils.java):
There are three different parts that we are currently using to generate the temporary directory:
- TEMP_IMPORT_ROOT: Constant. It can be changed by the user if needed, but as we do not have this documented, most users are using the default constant value.
- timeId - Current time with millisecond precision.
- tableName - Name of the transferred table or null for query (--query) based import.
The problem mainly surfaces in the --query based import when 2 out of the 3 parts are constants and it can happen that two Sqoop jobs might get started at the same time.