Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.21.0
    • Component/s: test
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      TestNodeRefresh sometimes timed out. This happened because the test started a MR cluster with 2 trackers and ran a half-waiting-mapper job. Tasks that have id > total-maps/2 wait for a signal. Because of 2 trackers, the tasks got scheduled out of order (locality) and hence the job got stuck. The fix is to start only one tracker and then add a new tracker later.
      Show
      TestNodeRefresh sometimes timed out. This happened because the test started a MR cluster with 2 trackers and ran a half-waiting-mapper job. Tasks that have id > total-maps/2 wait for a signal. Because of 2 trackers, the tasks got scheduled out of order (locality) and hence the job got stuck. The fix is to start only one tracker and then add a new tracker later.
    1. TEST-org.apache.hadoop.mapred.TestNodeRefresh.txt
      335 kB
      Jothi Padmanabhan
    2. MAPREDUCE-677-v1.1-branch-0.20.patch
      1.0 kB
      Amar Kamat
    3. MAPREDUCE-677-v1.1.patch
      0.9 kB
      Amar Kamat
    4. MAPREDUCE-677-v1.0.patch
      0.8 kB
      Amar Kamat

      Activity

      Tom White made changes -
      Status Resolved [ 5 ] Closed [ 6 ]
      Hide
      Ravi Gummadi added a comment -

      The patch for branch 0.20 uploaded on 10th July needs to be committed to Y! 20 distribution.

      Show
      Ravi Gummadi added a comment - The patch for branch 0.20 uploaded on 10th July needs to be committed to Y! 20 distribution.
      Amar Kamat made changes -
      Status Reopened [ 4 ] Resolved [ 5 ]
      Resolution Fixed [ 1 ]
      Hide
      Amar Kamat added a comment -

      MAPREDUCE-760 should fix this.

      Show
      Amar Kamat added a comment - MAPREDUCE-760 should fix this.
      Hide
      Amar Kamat added a comment -

      This looks like a timing issue to me. I think we should start the new tracker after all the asserts are done.

      Show
      Amar Kamat added a comment - This looks like a timing issue to me. I think we should start the new tracker after all the asserts are done.
      Jothi Padmanabhan made changes -
      Hide
      Jothi Padmanabhan added a comment -

      Test Log with a time out

      Show
      Jothi Padmanabhan added a comment - Test Log with a time out
      Jothi Padmanabhan made changes -
      Resolution Fixed [ 1 ]
      Status Resolved [ 5 ] Reopened [ 4 ]
      Hide
      Jothi Padmanabhan added a comment -

      I am still seeing a time out with this test

      Show
      Jothi Padmanabhan added a comment - I am still seeing a time out with this test
      Hide
      Hudson added a comment -

      Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/)

      Show
      Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #20 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/ )
      Amar Kamat made changes -
      Release Note TestNodeRefresh sometimes timed out. This happened because the test started a MR cluster with 2 trackers and ran a half-waiting-mapper job. Tasks that have id > total-maps/2 wait for a signal. Because of 2 trackers, the tasks got scheduled out of order (locality) and hence the job got stuck. The fix is to start only one tracker and then add a new tracker later.
      Amar Kamat made changes -
      Attachment MAPREDUCE-677-v1.1-branch-0.20.patch [ 12413086 ]
      Hide
      Amar Kamat added a comment -

      Attaching an example patch for 0.20 branch not to be committed.

      Show
      Amar Kamat added a comment - Attaching an example patch for 0.20 branch not to be committed.
      Sharad Agarwal made changes -
      Status Open [ 1 ] Resolved [ 5 ]
      Hadoop Flags [Reviewed]
      Fix Version/s 0.21.0 [ 12314045 ]
      Resolution Fixed [ 1 ]
      Hide
      Sharad Agarwal added a comment -

      I committed this. Thanks Amar!

      Show
      Sharad Agarwal added a comment - I committed this. Thanks Amar!
      Hide
      Amar Kamat added a comment -

      Result of test-patch
      [exec] +1 overall.
      [exec]
      [exec] +1 @author. The patch does not contain any @author tags.
      [exec]
      [exec] +1 tests included. The patch appears to include 3 new or modified tests.
      [exec]
      [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
      [exec]
      [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
      [exec]
      [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
      [exec]
      [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

      Show
      Amar Kamat added a comment - Result of test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
      Amar Kamat made changes -
      Attachment MAPREDUCE-677-v1.1.patch [ 12412995 ]
      Hide
      Amar Kamat added a comment -

      Changing the testcase to start a new tracker with a new hostname instead of using new hostname generated by MiniMRCluster. Running test-patch now.

      Show
      Amar Kamat added a comment - Changing the testcase to start a new tracker with a new hostname instead of using new hostname generated by MiniMRCluster. Running test-patch now.
      Hide
      Sharad Agarwal added a comment -

      In the modified test, I think we should ensure that the new tasktracker launched is always of different name. Otherwise the new tracker would be excluded and cluster would have no tracker, leaving job.waitForCompletion() waiting till timeout.

      Show
      Sharad Agarwal added a comment - In the modified test, I think we should ensure that the new tasktracker launched is always of different name. Otherwise the new tracker would be excluded and cluster would have no tracker, leaving job.waitForCompletion() waiting till timeout.
      Hide
      Amar Kamat added a comment -

      I ran TestNodeRefresh 100 times and without the patch it failed(timed out) 9 times while with the patch it never timed out.

      Show
      Amar Kamat added a comment - I ran TestNodeRefresh 100 times and without the patch it failed(timed out) 9 times while with the patch it never timed out.
      Hide
      Amar Kamat added a comment -

      Test patch result
      [exec] +1 overall.
      [exec]
      [exec] +1 @author. The patch does not contain any @author tags.
      [exec]
      [exec] +1 tests included. The patch appears to include 3 new or modified tests.
      [exec]
      [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
      [exec]
      [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
      [exec]
      [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
      [exec]
      [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

      Testing/Running TestNodeRefresh now.

      Show
      Amar Kamat added a comment - Test patch result [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Testing/Running TestNodeRefresh now.
      Amar Kamat made changes -
      Field Original Value New Value
      Attachment MAPREDUCE-677-v1.0.patch [ 12412836 ]
      Hide
      Amar Kamat added a comment -

      Attaching a patch that starts only one tracker so that the ordering is maintained. Testing in progress.

      Show
      Amar Kamat added a comment - Attaching a patch that starts only one tracker so that the ordering is maintained. Testing in progress.
      Hide
      Amar Kamat added a comment -

      I finally got one run where this testcase timed out. Looks like the testcase starts 2 trackers and then waits for 50% of the mappers to finish. When tasks are scheduled out of order then the whole testcase gets stuck as the task that is scheduled out of order waits forever blocking the cluster. Will attach a patch asap.

      Show
      Amar Kamat added a comment - I finally got one run where this testcase timed out. Looks like the testcase starts 2 trackers and then waits for 50% of the mappers to finish. When tasks are scheduled out of order then the whole testcase gets stuck as the task that is scheduled out of order waits forever blocking the cluster. Will attach a patch asap.
      Hide
      Amar Kamat added a comment -

      I am trying to reproduce this failure and I am not able to do it. Can someone plz attach failure logs (nohup logs etc) or comment as to how to reproduce it.

      Show
      Amar Kamat added a comment - I am trying to reproduce this failure and I am not able to do it. Can someone plz attach failure logs (nohup logs etc) or comment as to how to reproduce it.
      Amar Kamat created issue -

        People

        • Assignee:
          Amar Kamat
          Reporter:
          Amar Kamat
        • Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development