Uploaded image for project: 'HCatalog'
  1. HCatalog
  2. HCATALOG-545

Improve failure recovery for FileOutputCommitterContainer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.4, 0.5
    • None
    • mapreduce
    • None

    Description

      When a M/R job creates partitions in multiple Hive tables, all partitions are committed in the same cleanup task via multiple instances of the FileOutputCommitterContainer.

      Currently, when one of the FileOutputCommitterContainer fails, the cleanup task exits with failure and retries. However, the retry would be blocked by "partition exists" error caused by the partial commits.

      Instead, the cleanup task should roll back all previous commits to the different tables in case of failure so that the next retry can continue.
      Also, if all retries of the cleanup taks fail, no partial commit should be left in the Hive metastore.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pengfeng Feng Peng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: