Issue Details (XML | Word | Printable)

Key: HADOOP-236
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Sharad Agarwal
Reporter: Hairong Kuang
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Hadoop Common

job tracker should refuse connection from a task tracker with a different version number

Created: 20/May/06 02:17 AM   Updated: 08/Jul/09 04:51 PM
Return to search
Component/s: None
Affects Version/s: 0.2.1
Fix Version/s: 0.18.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works 236_v1.patch 2008-05-29 10:17 AM Sharad Agarwal 3 kB
Text File Licensed for inclusion in ASF works 236_v2.patch 2008-06-05 10:01 AM Sharad Agarwal 4 kB
Issue Links:
Reference
 

Hadoop Flags: Reviewed
Release Note: Changed connection protocol job tracker and task tracker so that task tracker will not connect to a job tracker with a different build version.
Resolution Date: 06/Jun/08 05:40 AM


 Description  « Hide
After one mapred system upgrade, we noticed that all tasks assigned to one task tracker failed. It turned out that for some reason the task tracker was not upgraded.

To avoid this, a task tracker should reports its version # when it registers itsself with a job tracker. If the job tracker receives an inconsistent version #, it should refuse the connection.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Sameer Paranjpye added a comment - 15/Aug/06 09:28 PM
We need a Tasktracker registration mechanism that validates a tasktrackers version when it attempts to first contact the jobtracker. A related issue is that the out of date tasktracker should then gracefully shut down and not try to keep pinging the jobtracker.

The jobtracker may optionally report mimatched tasktracker that tried to connect to it.


Sharad Agarwal added a comment - 26/May/08 10:38 AM
Here is an approach:
1. Pass additional argument -> String buildVersion in InterTrackerProtocol.heartbeat method
2. VersionInfo.getRevision() can be used to figure out the buildVersion for jobtracker and tasktracker
3. In the JobTracker's heartbeat method, verify the buildVersion with the Jobtracker's. If buildVersion does not match, throw an Exception which would result in TaskTracker to shutdown.

One other approach could be to include buildVersion in TaskTrackerStatus class. That would avoid changing the InterTrackerProtocol.heartbeat method signature. But the drawback would be that it would increase the JobTracker memory footprint as TaskTrackerStatus objects are stored in Jobtracker's memory.


Sharad Agarwal added a comment - 29/May/08 10:17 AM
On relook, seems like buildVersion check at the initialContact is sufficient (similar to the Namenode/Datanode model).
Attaching the patch for review.

Amareshwari Sriramadasu added a comment - 02/Jun/08 10:41 AM
+1 Patch looks good.

Hadoop QA added a comment - 03/Jun/08 07:13 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12383007/236_v1.patch
against trunk revision 662813.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2547/console

This message is automatically generated.


Sharad Agarwal added a comment - 04/Jun/08 05:34 AM
Not included any test case as it is difficult to write for this functionality.

Owen O'Malley added a comment - 04/Jun/08 06:20 PM
The revision isn't strong enough, if you make a change and restart the cluster, an old task tracker would still agree. It really should be:
VersionInfo.getVersion() + " from " + VersionInfo.getRevision() + " by " + 
VersionInfo.getUser() + " on " + VersionInfo.getDate()

Sharad Agarwal added a comment - 05/Jun/08 10:01 AM
Attaching the new version of patch. Incorporated Owen's recommendation.

Hadoop QA added a comment - 05/Jun/08 01:34 PM
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12383449/236_v2.patch
against trunk revision 663487.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2588/console

This message is automatically generated.


Sharad Agarwal added a comment - 05/Jun/08 01:53 PM
the test case failures are unrelated to this patch.

Devaraj Das added a comment - 06/Jun/08 05:40 AM
I just committed this. Thanks, Sharad!