Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.204.0
    • Fix Version/s: 1.1.0
    • Component/s: tasktracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Target Version/s:

      Description

      The TT will currently try to re-init itself on disk failure even if it has no good local dirs. It should shutdown instead.

      1. mapreduce-2957.patch
        1 kB
        Eli Collins
      2. mapreduce-2957-1.patch
        2 kB
        Eli Collins
      3. mapreduce-2957-2.patch
        2 kB
        Eli Collins

        Activity

        Eli Collins created issue -
        Hide
        Arun C Murthy added a comment -

        Won't MAPREDUCE-2924 handle this?

        Show
        Arun C Murthy added a comment - Won't MAPREDUCE-2924 handle this?
        Hide
        Ravi Gummadi added a comment -

        The original implementation in MR-2413 already handles this case of all-disks-failed by throwing Exception from checkLocalDirs(). No ?

        Show
        Ravi Gummadi added a comment - The original implementation in MR-2413 already handles this case of all-disks-failed by throwing Exception from checkLocalDirs(). No ?
        Hide
        Eli Collins added a comment - - edited

        The TT trys to shut down due to the DiskErrorException thrown from checkLocalDirs, however because offerService returns STALE in this case run tries to reinitialize instead of shutdown.

        Show
        Eli Collins added a comment - - edited The TT trys to shut down due to the DiskErrorException thrown from checkLocalDirs, however because offerService returns STALE in this case run tries to reinitialize instead of shutdown.
        Hide
        Eli Collins added a comment -

        Patch attached.

        Show
        Eli Collins added a comment - Patch attached.
        Eli Collins made changes -
        Field Original Value New Value
        Attachment mapreduce-2957.patch [ 12494552 ]
        Eli Collins made changes -
        Assignee Eli Collins [ eli ]
        Hide
        Ravi Gummadi added a comment -

        +1 to avoid calling TaskTracker.initialize() if there are no good local dirs.

        Even with this patch, i.e. though State.DENIED is returned by offerService() when DiskCheckerException is seen, initialize() is getting called once more and is getting DiskCheckerException from TT.initialize() -> DiskChecker.checkLocalDirs(). This behavior is similar to what was happening without this patch. This needs to be improved somehow and we need to avoid the call to TT.initialize() in this case.

        Show
        Ravi Gummadi added a comment - +1 to avoid calling TaskTracker.initialize() if there are no good local dirs. Even with this patch, i.e. though State.DENIED is returned by offerService() when DiskCheckerException is seen, initialize() is getting called once more and is getting DiskCheckerException from TT.initialize() -> DiskChecker.checkLocalDirs(). This behavior is similar to what was happening without this patch. This needs to be improved somehow and we need to avoid the call to TT.initialize() in this case.
        Matt Foley made changes -
        Fix Version/s 0.20.205.0 [ 12316391 ]
        Target Version/s 0.20.206.0 [ 12317960 ]
        Hide
        Eli Collins added a comment -

        Good point, the TT shouldn't re-initialize if offerService returns DENIED, both for this new use of DENIED and the current one. I'll update the patch.

        Show
        Eli Collins added a comment - Good point, the TT shouldn't re-initialize if offerService returns DENIED, both for this new use of DENIED and the current one. I'll update the patch.
        Hide
        Eli Collins added a comment -

        Updated patch attached. Previous patch plus the TT bails from the run loop before re-initializing if we were denied since we're going to shutdown anyway.

        Show
        Eli Collins added a comment - Updated patch attached. Previous patch plus the TT bails from the run loop before re-initializing if we were denied since we're going to shutdown anyway.
        Eli Collins made changes -
        Attachment mapreduce-2957-1.patch [ 12499980 ]
        Hide
        Ravi Gummadi added a comment -

        Patch looks fine to me.
        1 minor nit: If DEE is seen, then close() is called twice (once in finally block and 2nd time in shutDown()). If possible, we should avoid this (though calling close() twice may not be harmful except some more cpu-cycles-usage) ?

        Show
        Ravi Gummadi added a comment - Patch looks fine to me. 1 minor nit: If DEE is seen, then close() is called twice (once in finally block and 2nd time in shutDown()). If possible, we should avoid this (though calling close() twice may not be harmful except some more cpu-cycles-usage) ?
        Hide
        Eli Collins added a comment -

        Yea the double close isn't just for a DEE, we always execute the close in the finally block. In the case where shuttingDown is true this isn't a duplicate call as shuttingDown is set (by heckJettyPort) w/o calling shutdown. This is a duplicate call for all the denied paths though so we don't need to close here if denied is true.

        Upated patch attached.

        Show
        Eli Collins added a comment - Yea the double close isn't just for a DEE, we always execute the close in the finally block. In the case where shuttingDown is true this isn't a duplicate call as shuttingDown is set (by heckJettyPort) w/o calling shutdown. This is a duplicate call for all the denied paths though so we don't need to close here if denied is true. Upated patch attached.
        Eli Collins made changes -
        Attachment mapreduce-2957-2.patch [ 12500211 ]
        Hide
        Ravi Gummadi added a comment -

        Looks good to me.
        +1

        Show
        Ravi Gummadi added a comment - Looks good to me. +1
        Hide
        Eli Collins added a comment -

        Thanks for the review Ravi! I've committed this.

        Show
        Eli Collins added a comment - Thanks for the review Ravi! I've committed this.
        Eli Collins made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Resolution Fixed [ 1 ]
        Matt Foley made changes -
        Fix Version/s 1.1.0 [ 12317960 ]
        Hide
        Matt Foley added a comment -

        Closed upon release of Hadoop-1.1.0.

        Show
        Matt Foley added a comment - Closed upon release of Hadoop-1.1.0.
        Matt Foley made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Eli Collins
            Reporter:
            Eli Collins
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development