Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-4412

Race condition in writing multiple outputs from STREAM op

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • None
    • 0.18.0
    • impl
    • None
    • Reviewed

    Description

      Basically copying the issue described here:

      http://stackoverflow.com/questions/28327044/pig-streaming-some-output-files-are-missing

      Roughly, I believe the issue is that there is a race condition in the code in the HadoopExecutableManager that moves multiple output files from a script into HDFS and the MapReduce task that is shutting down after it writes the last bits from the "main" output of the STREAM task. Pig needs to make sure that the ExecutableManager is closed (and thus the files are moved from the local dir to HDFS) before it returns the end-of-stream tuple to signal that the stream is finished.

      Attachments

        1. PIG-4412.patch
          3 kB
          Josh Wills

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jwills Josh Wills
            jwills Josh Wills

            Dates

              Created:
              Updated:

              Slack

                Issue deployment