Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-4890

Invalid TaskImpl state transitions when task fails while speculating

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.0.2-alpha, 0.23.5
    • 2.0.3-alpha, 0.23.6
    • mr-am
    • None

    Description

      There are a couple of issues when a task fails while speculating (i.e.: multiple attempts are active):

      1. The other active attempts are not killed.
      2. TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which can be sent from the other active attempts. These all need to be handled since they can be sent asynchronously from the other active task attempts.

      Failure to handle this properly means jobs that are configured to normally tolerate failures via mapreduce.map.failures.maxpercent or mapreduce.reduce.failures.maxpercent and also speculate can easily end up failing due to invalid state transitions rather than complete successfully with a few explicitly allowed task failures.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jlowe Jason Darrell Lowe Assign to me
            jlowe Jason Darrell Lowe
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment