Hadoop Common
  1. Hadoop Common
  2. HADOOP-236

job tracker should refuse connection from a task tracker with a different version number

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.2.1
    • Fix Version/s: 0.18.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    • Release Note:
      Changed connection protocol job tracker and task tracker so that task tracker will not connect to a job tracker with a different build version.

      Description

      After one mapred system upgrade, we noticed that all tasks assigned to one task tracker failed. It turned out that for some reason the task tracker was not upgraded.

      To avoid this, a task tracker should reports its version # when it registers itsself with a job tracker. If the job tracker receives an inconsistent version #, it should refuse the connection.

      1. 236_v1.patch
        3 kB
        Sharad Agarwal
      2. 236_v2.patch
        4 kB
        Sharad Agarwal

        Issue Links

          Activity

          Hide
          Devaraj Das added a comment -

          I just committed this. Thanks, Sharad!

          Show
          Devaraj Das added a comment - I just committed this. Thanks, Sharad!
          Hide
          Sharad Agarwal added a comment -

          the test case failures are unrelated to this patch.

          Show
          Sharad Agarwal added a comment - the test case failures are unrelated to this patch.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12383449/236_v2.patch
          against trunk revision 663487.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 core tests. The patch failed core unit tests.

          -1 contrib tests. The patch failed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383449/236_v2.patch against trunk revision 663487. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/console This message is automatically generated.
          Hide
          Sharad Agarwal added a comment -

          Attaching the new version of patch. Incorporated Owen's recommendation.

          Show
          Sharad Agarwal added a comment - Attaching the new version of patch. Incorporated Owen's recommendation.
          Hide
          Owen O'Malley added a comment -

          The revision isn't strong enough, if you make a change and restart the cluster, an old task tracker would still agree. It really should be:

          VersionInfo.getVersion() + " from " + VersionInfo.getRevision() + " by " + 
          VersionInfo.getUser() + " on " + VersionInfo.getDate()
          
          Show
          Owen O'Malley added a comment - The revision isn't strong enough, if you make a change and restart the cluster, an old task tracker would still agree. It really should be: VersionInfo.getVersion() + " from " + VersionInfo.getRevision() + " by " + VersionInfo.getUser() + " on " + VersionInfo.getDate()
          Hide
          Sharad Agarwal added a comment -

          Not included any test case as it is difficult to write for this functionality.

          Show
          Sharad Agarwal added a comment - Not included any test case as it is difficult to write for this functionality.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12383007/236_v1.patch
          against trunk revision 662813.

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no tests are needed for this patch.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/artifact/trunk/build/test/checkstyle-errors.html
          Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12383007/236_v1.patch against trunk revision 662813. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/console This message is automatically generated.
          Hide
          Amareshwari Sriramadasu added a comment -

          +1 Patch looks good.

          Show
          Amareshwari Sriramadasu added a comment - +1 Patch looks good.
          Hide
          Sharad Agarwal added a comment -

          On relook, seems like buildVersion check at the initialContact is sufficient (similar to the Namenode/Datanode model).
          Attaching the patch for review.

          Show
          Sharad Agarwal added a comment - On relook, seems like buildVersion check at the initialContact is sufficient (similar to the Namenode/Datanode model). Attaching the patch for review.
          Hide
          Sharad Agarwal added a comment -

          Here is an approach:
          1. Pass additional argument -> String buildVersion in InterTrackerProtocol.heartbeat method
          2. VersionInfo.getRevision() can be used to figure out the buildVersion for jobtracker and tasktracker
          3. In the JobTracker's heartbeat method, verify the buildVersion with the Jobtracker's. If buildVersion does not match, throw an Exception which would result in TaskTracker to shutdown.

          One other approach could be to include buildVersion in TaskTrackerStatus class. That would avoid changing the InterTrackerProtocol.heartbeat method signature. But the drawback would be that it would increase the JobTracker memory footprint as TaskTrackerStatus objects are stored in Jobtracker's memory.

          Show
          Sharad Agarwal added a comment - Here is an approach: 1. Pass additional argument -> String buildVersion in InterTrackerProtocol.heartbeat method 2. VersionInfo.getRevision() can be used to figure out the buildVersion for jobtracker and tasktracker 3. In the JobTracker's heartbeat method, verify the buildVersion with the Jobtracker's. If buildVersion does not match, throw an Exception which would result in TaskTracker to shutdown. One other approach could be to include buildVersion in TaskTrackerStatus class. That would avoid changing the InterTrackerProtocol.heartbeat method signature. But the drawback would be that it would increase the JobTracker memory footprint as TaskTrackerStatus objects are stored in Jobtracker's memory.
          Hide
          Sameer Paranjpye added a comment -

          We need a Tasktracker registration mechanism that validates a tasktrackers version when it attempts to first contact the jobtracker. A related issue is that the out of date tasktracker should then gracefully shut down and not try to keep pinging the jobtracker.

          The jobtracker may optionally report mimatched tasktracker that tried to connect to it.

          Show
          Sameer Paranjpye added a comment - We need a Tasktracker registration mechanism that validates a tasktrackers version when it attempts to first contact the jobtracker. A related issue is that the out of date tasktracker should then gracefully shut down and not try to keep pinging the jobtracker. The jobtracker may optionally report mimatched tasktracker that tried to connect to it.

            People

            • Assignee:
              Sharad Agarwal
              Reporter:
              Hairong Kuang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development