Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-5273

_SUCCESS file should be created at the end of the job

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      One of the users ran into issues because _SUCCESS file was created by FileOutputCommitter.commitJob() and storeCleanup() called after that in PigOutputCommitter failed to store schema due to network outage. abortJob was then called and the StoreFunc.cleanupOnFailure method in it deleted the output directory. Downstream jobs that started because of _SUCCESS file ran with empty data
      Possible solutions:
      1) Move storeCleanup before commit. Found that order was reversed in https://issues.apache.org/jira/browse/PIG-2642, probably due to FileOutputCommitter version 1 and might not be a problem with FileOutputCommitter version 2. This would still not help when there are multiple outputs as main problem is cleanupOnFailure in abortJob deleting directories.
      2) We can change cleanupOnFailure not delete output directories. It still does not help. The Oozie action retry might kick in and delete the directory while the downstream has already started running because of the _SUCCESS file.
      3) It cannot be done in the OutputCommitter at all as multiple output committers are called in parallel in Tez. We can have Pig suppress _SUCCESS creation and try creating them all at the end in TezLauncher if job has succeeded before calling cleanupOnSuccess. Can probably add it as a configurable setting and turn on by default in our clusters. This is probably the possible solution

      Thank you Rohini Palaniswamy for finding out the issue and providing solution.

        Attachments

        1. PIG-5273-2.patch
          26 kB
          Satish Saley
        2. PIG-5273-1.patch
          15 kB
          Satish Saley

          Issue Links

            Activity

              People

              • Assignee:
                satishsaley Satish Saley
                Reporter:
                satishsaley Satish Saley
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: