Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-682

Reserved tasktrackers should be removed when a node is globally blacklisted

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: jobtracker
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Jobtracker was modified to cleanup reservations created on tasktracker nodes to support high RAM jobs, when the nodes are blacklisted.

      Description

      When support was added to reserve tasktrackers for high RAM jobs per MAPREDUCE-516, we missed removing reservations on tasktrackers that are globally blacklisted. This is not a major concern, just that the reservation might cause the job to finish a little later than if the reservation is removed when the blacklisting happens.

      1. mapreduce-682-ydist.patch
        2 kB
        Sreekanth Ramakrishnan
      2. mapreduce-682-2.patch
        8 kB
        Sreekanth Ramakrishnan
      3. mapreduce-682-1.patch
        4 kB
        Sreekanth Ramakrishnan

        Activity

        Hide
        Arun C Murthy added a comment -

        While we are at it, we should ensure that nodes declared unhealthy (MAPREDUCE-211) also are removed from the list of reserved tasktrackers for jobs...

        Show
        Arun C Murthy added a comment - While we are at it, we should ensure that nodes declared unhealthy ( MAPREDUCE-211 ) also are removed from the list of reserved tasktrackers for jobs...
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch which fixes this issue.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch which fixes this issue.
        Hide
        Hemanth Yamijala added a comment -

        Looks fine. A couple of minor points:

        • In FaultyTrackersInfo.blacklistTracker, the trackers we get for a hostname are in a synchronized set and we are iterating over them. The javadoc of the Collections.synchronizedSet says that when iterating we should guard it in a synchronized block.
        • As far as I can see, cancelAllReservations need not be synchronized now, because all code paths seem to be coming after locking the jobtracker. Can you please verify this once ? Possibly add a comment stating the assumption on cancelAllReservations ?
        • In the test case, can we also blacklist a different tracker (with reservations) due to node health check reasons and make sure that both trackers' reservations are removed. It just makes the test case cover more scenarios with little work.
        • Please add a comment for the cleanup steps in the test cases so its clear why we need to do those steps.
        Show
        Hemanth Yamijala added a comment - Looks fine. A couple of minor points: In FaultyTrackersInfo.blacklistTracker, the trackers we get for a hostname are in a synchronized set and we are iterating over them. The javadoc of the Collections.synchronizedSet says that when iterating we should guard it in a synchronized block. As far as I can see, cancelAllReservations need not be synchronized now, because all code paths seem to be coming after locking the jobtracker. Can you please verify this once ? Possibly add a comment stating the assumption on cancelAllReservations ? In the test case, can we also blacklist a different tracker (with reservations) due to node health check reasons and make sure that both trackers' reservations are removed. It just makes the test case cover more scenarios with little work. Please add a comment for the cleanup steps in the test cases so its clear why we need to do those steps.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch incorporating Hemanths comments.

        • Synchronizing on the set before iterating over it.
        • modified the test case to test with Node health reasons comment.
        • Added comment in TaskTracker to state the assumption.
        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch incorporating Hemanths comments. Synchronizing on the set before iterating over it. modified the test case to test with Node health reasons comment. Added comment in TaskTracker to state the assumption.
        Hide
        Hemanth Yamijala added a comment -

        Changes look fine to me.

        +1.

        Show
        Hemanth Yamijala added a comment - Changes look fine to me. +1.
        Hide
        Sreekanth Ramakrishnan added a comment -

        Since hudson patch queue seems to be stuck. Output from ant test-patch

             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
        

        Running tests.

        Show
        Sreekanth Ramakrishnan added a comment - Since hudson patch queue seems to be stuck. Output from ant test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] Running tests.
        Hide
        Sreekanth Ramakrishnan added a comment -

        All tests passed locally, both contrib and core.

        Show
        Sreekanth Ramakrishnan added a comment - All tests passed locally, both contrib and core.
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Sreekanth !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Sreekanth !
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch for Yahoo! distribution.

        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch for Yahoo! distribution.

          People

          • Assignee:
            Sreekanth Ramakrishnan
            Reporter:
            Hemanth Yamijala
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development