Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3089

streaming should accept stderr from task before first key arrives

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0, 0.14.1, 0.14.2, 0.14.3, 0.14.4, 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0, 0.16.1
    • 0.17.0
    • None
    • None
    • Reviewed

    Description

      Stderr output from a streaming task is not collected until the MRErrorThread is started by PipeMapRed.startOutputThreads(), which is done on the first call to map() or reduce().

      This makes it difficult to debug failures in starting up the task process. It can also lead to deadlock when a task receives no input keys but produces significant stderr output: the process will block on writing to stderr, while streaming will block waiting for the process to exit.

      We should start the MRErrorThread when the process is forked, and then add the reporter later to enable stderr output serve as a keep-alive.

      Attachments

        1. patch-stderr-3089-2.txt
          9 kB
          Rick Cox
        2. patch-stderr-3089.txt
          9 kB
          Rick Cox

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rickcox Rick Cox
            rickcox Rick Cox
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment