Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.4, 0.5
-
None
-
None
Description
When a M/R job creates partitions in multiple Hive tables, all partitions are committed in the same cleanup task via multiple instances of the FileOutputCommitterContainer.
Currently, when one of the FileOutputCommitterContainer fails, the cleanup task exits with failure and retries. However, the retry would be blocked by "partition exists" error caused by the partial commits.
Instead, the cleanup task should roll back all previous commits to the different tables in case of failure so that the next retry can continue.
Also, if all retries of the cleanup taks fail, no partial commit should be left in the Hive metastore.