Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-709

node health check script does not display the correct message on timeout

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.21.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When the node health check script takes more than "mapred.healthChecker.script.timeout" to return, it should display a timeout message. Instead it displays the full stacktrace as below:

      java.io.IOException: Stream closed at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:145) 
      at java.io.BufferedInputStream.read(BufferedInputStream.java:308) 
      at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264) 
      at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306) 
      at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158) 
      at java.io.InputStreamReader.read(InputStreamReader.java:167) 
      at java.io.BufferedReader.fill(BufferedReader.java:136) 
      at java.io.BufferedReader.readLine(BufferedReader.java:299) 
      at java.io.BufferedReader.readLine(BufferedReader.java:362) 
      at org.apache.hadoop.util.Shell.runCommand(Shell.java:202) 
      at org.apache.hadoop.util.Shell.run(Shell.java:145) 
      at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:338) 
      at org.apache.hadoop.mapred.NodeHealthCheckerService$NodeHealthMonitorExecutor.run(NodeHealthCheckerService.java:119) 
      at java.util.TimerThread.mainLoop(Timer.java:512) 
      at java.util.TimerThread.run(Timer.java:462) 
      

      Also the "mapred.healthChecker.script.timeout" is not being reflected in the job.xml. It always picks up the default value. It is just an UI issue.

      1. mapred-709-1.patch
        3 kB
        Sreekanth Ramakrishnan
      2. mapred-709-ydist.patch
        2 kB
        Sreekanth Ramakrishnan

        Activity

        Tom White made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/)
        . Fixes message displayed for a blacklisted node where the reason for blacklisting is due to the health check script timing out. Contributed by Sreekanth Ramakrishnan.

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #15 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/15/ ) . Fixes message displayed for a blacklisted node where the reason for blacklisting is due to the health check script timing out. Contributed by Sreekanth Ramakrishnan.
        Hemanth Yamijala made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Hemanth Yamijala added a comment -

        I just committed this. Thanks, Sreekanth !

        Show
        Hemanth Yamijala added a comment - I just committed this. Thanks, Sreekanth !
        Sreekanth Ramakrishnan made changes -
        Attachment mapred-709-ydist.patch [ 12412711 ]
        Hide
        Sreekanth Ramakrishnan added a comment -

        Y! distribution patch

        Show
        Sreekanth Ramakrishnan added a comment - Y! distribution patch
        Hide
        Sreekanth Ramakrishnan added a comment -

        All tests passed locally.

        Show
        Sreekanth Ramakrishnan added a comment - All tests passed locally.
        Hide
        Sreekanth Ramakrishnan added a comment -

        output from ant test-patch

             [exec] +1 overall.
             [exec]
             [exec]     +1 @author.  The patch does not contain any @author tags.
             [exec]
             [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
             [exec]
             [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
             [exec]
             [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
             [exec]
             [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
             [exec]
             [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
             [exec]
        
        Show
        Sreekanth Ramakrishnan added a comment - output from ant test-patch [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec]
        Hide
        Hemanth Yamijala added a comment -

        Changes look good to me. +1. Please upload results of test-patch and relevant tests.

        Show
        Hemanth Yamijala added a comment - Changes look good to me. +1. Please upload results of test-patch and relevant tests.
        Hide
        Ramya Sunil added a comment -
        Also the "mapred.healthChecker.script.timeout" is not being reflected in the job.xml. It always picks up the default value. It is just an UI issue.
        The issue was due to wrong conf dir. No longer observed.
        Show
        Ramya Sunil added a comment - Also the "mapred.healthChecker.script.timeout" is not being reflected in the job.xml. It always picks up the default value. It is just an UI issue. The issue was due to wrong conf dir. No longer observed.
        Sreekanth Ramakrishnan made changes -
        Attachment mapred-709-1.patch [ 12412597 ]
        Hide
        Sreekanth Ramakrishnan added a comment -

        Attaching patch fixing this issue:

        • Setting proper exit status, previous we were not using TIMEOUT enum.
        • Changed test case to check for proper timeout message.
        Show
        Sreekanth Ramakrishnan added a comment - Attaching patch fixing this issue: Setting proper exit status, previous we were not using TIMEOUT enum. Changed test case to check for proper timeout message.
        Sreekanth Ramakrishnan made changes -
        Field Original Value New Value
        Assignee Sreekanth Ramakrishnan [ sreekanth ]
        Ramya Sunil created issue -

          People

          • Assignee:
            Sreekanth Ramakrishnan
            Reporter:
            Ramya Sunil
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development