[HADOOP-3677] Problems with generation stamp upgrade - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.18.0
Fix Version/s: 0.18.0
Component/s: None
Labels:
None

Release Note:
Simplify generation stamp upgrade by making is a local upgrade on datandodes. Deleted distributed upgrade.

Description

The generation stamp upgrade renames blocks' meta-files so that the name contains the block generation stamp as stated in ~~HADOOP-2656~~.
If a data-node has blocks that do not belong to any files and the name-node asks the data-node to remove those blocks
during or before the upgrade started the data-node will remove the blocks but not the meta-files because their names
are still in the old format which is not recognized by the new code. So we can end up with a number of garbage files which
will be hard to recognize that they are unused and the system will never remove them automatically.
I think this should be handled by the upgrade code in the end, but may be it will be right to fix ~~HADOOP-3002~~ for the 0.18 release,
which will avoid scheduling block removal when the name-node is in safe-mode.
I was not able to get the upgrade -force option to work. This option lets the name-node proceed with a distributed upgrade even if
the data-nodes are not able to complete their local upgrades. Did we test this feature at all for the generation stamp upgrade?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HADOOP-3677-trunk.patch
15/Jul/08 19:56
5 kB
Raghu Angadi
HADOOP-3677-trunk.patch
16/Jul/08 04:05
44 kB
Raghu Angadi
HADOOP-3677-branch-018.patch
16/Jul/08 04:02
39 kB
Raghu Angadi

Issue Links

relates to

HADOOP-3002 HDFS should not remove blocks while in safemode.

Closed

Activity

Ascending order - Click to sort in descending order

Dhruba Borthakur added a comment - 09/Jul/08 00:13

One workaround is as follows:

1. Shut down namenode and then restart namenode (with existing release). This will cause datanodes to send block reports and delete blocks that are not in the namespace.

2. Shutdown cluster. Install new software on all nodes. Restart with -upgrade option. This will not have to delete blocks becuase orphaned blocks were already deleted in Step-1.

If this workaround sounds feasible, then we can remove this issue from the 0.18 Blocker list.

Dhruba Borthakur added a comment - 09/Jul/08 00:13 One workaround is as follows: 1. Shut down namenode and then restart namenode (with existing release). This will cause datanodes to send block reports and delete blocks that are not in the namespace. 2. Shutdown cluster. Install new software on all nodes. Restart with -upgrade option. This will not have to delete blocks becuase orphaned blocks were already deleted in Step-1. If this workaround sounds feasible, then we can remove this issue from the 0.18 Blocker list.

Konstantin Shvachko added a comment - 09/Jul/08 02:25

> If this workaround sounds feasible, then we can remove this issue from the 0.18 Blocker list.

And document this procedure in release notes?

Konstantin Shvachko added a comment - 09/Jul/08 02:25 > If this workaround sounds feasible, then we can remove this issue from the 0.18 Blocker list. And document this procedure in release notes?

Raghu Angadi added a comment - 10/Jul/08 01:27

I think it is better to fix the problem than asking users to go through different procedure for upgrade. There are users with vastly different expertise and caution. What happens if they don't follow the procedure by mistake?

Also this special work around needs to be followed in even in future when some one is upgrading from 0.17 to 0.20.

Raghu Angadi added a comment - 10/Jul/08 01:27 I think it is better to fix the problem than asking users to go through different procedure for upgrade. There are users with vastly different expertise and caution. What happens if they don't follow the procedure by mistake? Also this special work around needs to be followed in even in future when some one is upgrading from 0.17 to 0.20.

Konstantin Shvachko added a comment - 10/Jul/08 02:31

May be a good solution would be to covert the distributed upgrade into local data-node upgrade.
It will solve both of the problems above, plus eliminate the warning message reported in ~~HADOOP-3732~~.
The only disadvantage of this approach I can see is that data-nodes will take rather long time to startup, around 5 minutes each on a large cluster.
But this can be solved by including reasonable messages about the upgrade progress.

Konstantin Shvachko added a comment - 10/Jul/08 02:31 May be a good solution would be to covert the distributed upgrade into local data-node upgrade. It will solve both of the problems above, plus eliminate the warning message reported in HADOOP-3732 . The only disadvantage of this approach I can see is that data-nodes will take rather long time to startup, around 5 minutes each on a large cluster. But this can be solved by including reasonable messages about the upgrade progress.

Dhruba Borthakur added a comment - 11/Jul/08 06:48

I disagree with Raghu to a certain extent. From my experience, when an administrator wants to upgrade the cluster, he/she restarts the namenode (before installing new software) so that that transaction log is consumed. This ensures that restarting with a new version of the software does not cause any unwanted interactions with edits-log-processing.

The workaround listed in this issue is just an extension of the above procedure.

Konstantin: can you pl explain what is a local data-node upgrade? and why it solves this bug? Thanks.

Dhruba Borthakur added a comment - 11/Jul/08 06:48 I disagree with Raghu to a certain extent. From my experience, when an administrator wants to upgrade the cluster, he/she restarts the namenode (before installing new software) so that that transaction log is consumed. This ensures that restarting with a new version of the software does not cause any unwanted interactions with edits-log-processing. The workaround listed in this issue is just an extension of the above procedure. Konstantin: can you pl explain what is a local data-node upgrade? and why it solves this bug? Thanks.

Raghu Angadi added a comment - 11/Jul/08 07:14

If the procedure is optional, then its ok. I was mainly commenting about "required" extra steps. This this was a blocker I thought workaround was required.

Raghu Angadi added a comment - 11/Jul/08 07:14 If the procedure is optional, then its ok. I was mainly commenting about "required" extra steps. This this was a blocker I thought workaround was required.

Raghu Angadi added a comment - 11/Jul/08 07:17

I haven't looked at the code around the upgrade or generation stamps yet. I think the basic fix (or work around) that Konstantin suggesting is to rename all the metadata files with default generation stamp when Datanode starts before reporting them to Namenode.

Main advantage I see is that it will avoid extra warnings for every block.

Raghu Angadi added a comment - 11/Jul/08 07:17 I haven't looked at the code around the upgrade or generation stamps yet. I think the basic fix (or work around) that Konstantin suggesting is to rename all the metadata files with default generation stamp when Datanode starts before reporting them to Namenode. Main advantage I see is that it will avoid extra warnings for every block.

Dhruba Borthakur added a comment - 11/Jul/08 16:49

Changing the distributed -upgrade to a local upgrade sounds ok (though not very essential) to me. It seems to be another hack: the real problem being that the distributed framework do not yet have a mechanism to specify that "do not send block reports before distributed upgrade is complete".

Another alternative would be to keep the namenode in safemode till distributed upgrade is complete.... isn't this already true today?

Dhruba Borthakur added a comment - 11/Jul/08 16:49 Changing the distributed -upgrade to a local upgrade sounds ok (though not very essential) to me. It seems to be another hack: the real problem being that the distributed framework do not yet have a mechanism to specify that "do not send block reports before distributed upgrade is complete". Another alternative would be to keep the namenode in safemode till distributed upgrade is complete.... isn't this already true today?

Raghu Angadi added a comment - 11/Jul/08 17:50

> ... specify that "do not send block reports before distributed upgrade is complete".
Yes, we can fix it with more features like this. Still we will be left with thousands of warning messages. Question is what do we do for this jira.

Whether a local upgrade is a hack I think is debatable. It makes logical sense to me : Datanode metada file name format has changed between 0.17 and 0.18, so datanode converts these names to new format when it is upgraded.

In any case, a hack only the core developers need to know might be more desirable than a hack in upgrade procedure that all admins need to be aware of.

If there consensus to convert metadata file name when datanode starts up, then I will submit a patch.

Raghu Angadi added a comment - 11/Jul/08 17:50 > ... specify that "do not send block reports before distributed upgrade is complete". Yes, we can fix it with more features like this. Still we will be left with thousands of warning messages. Question is what do we do for this jira. Whether a local upgrade is a hack I think is debatable. It makes logical sense to me : Datanode metada file name format has changed between 0.17 and 0.18, so datanode converts these names to new format when it is upgraded. In any case, a hack only the core developers need to know might be more desirable than a hack in upgrade procedure that all admins need to be aware of. If there consensus to convert metadata file name when datanode starts up, then I will submit a patch.

Dhruba Borthakur added a comment - 11/Jul/08 22:32

Hi Raghu, I am perfectly ok with the fix you are proposing. Thanks for taking this up.

Dhruba Borthakur added a comment - 11/Jul/08 22:32 Hi Raghu, I am perfectly ok with the fix you are proposing. Thanks for taking this up.

Konstantin Shvachko added a comment - 14/Jul/08 17:38

The advantages of the local upgrade (or as we also call it "version upgrade") as opposed to a distributed upgrade are that it avoids both problems stated in this jira, and also avoids thousands of warnings during data-node startup.
In my opinion, local/version upgrade is logically correct and is not a "hack" at all, because each data-node can complete the upgrade on its own without interacting with other data-nodes or the name-node. The distributed upgrade should be used when such intercommunications are required. E.g. during the crc-upgrade this was unavoidable.
The startup time will be the only disadvantage so the upgrade progress should be logged every 20-30 seconds.

Konstantin Shvachko added a comment - 14/Jul/08 17:38 The advantages of the local upgrade (or as we also call it "version upgrade") as opposed to a distributed upgrade are that it avoids both problems stated in this jira, and also avoids thousands of warnings during data-node startup. In my opinion, local/version upgrade is logically correct and is not a "hack" at all, because each data-node can complete the upgrade on its own without interacting with other data-nodes or the name-node. The distributed upgrade should be used when such intercommunications are required. E.g. during the crc-upgrade this was unavoidable. The startup time will be the only disadvantage so the upgrade progress should be logged every 20-30 seconds.

Raghu Angadi added a comment - 15/Jul/08 19:55

Suggested patch for trunk. This has the minimum changes for upgrade to work and disables the distributed upgrade. It does not take any longer than normal version upgrade and thus does not need more notifications than normal upgrade has.

Once this fix is ok, I will submit a patch that removes code related to distributed upgrade.. or it could be done in a seperate jira.

Raghu Angadi added a comment - 15/Jul/08 19:55 Suggested patch for trunk. This has the minimum changes for upgrade to work and disables the distributed upgrade. It does not take any longer than normal version upgrade and thus does not need more notifications than normal upgrade has. Once this fix is ok, I will submit a patch that removes code related to distributed upgrade.. or it could be done in a seperate jira.

Konstantin Shvachko added a comment - 15/Jul/08 23:55

I like this approach because it combines hard-linking with renaming and therefore does the task with no overhead.

I agree the distributed upgrade code should be removed if we do this.
constant oldMetaFileNamePattern should be in all capital letters, I'd prefer name smth like
PRE_GENERATION_STAMP_META_FILE_PATTERN
linkBlocks() does not need newLV parameter.
Spelling "currect"

Konstantin Shvachko added a comment - 15/Jul/08 23:55 I like this approach because it combines hard-linking with renaming and therefore does the task with no overhead. I agree the distributed upgrade code should be removed if we do this. constant oldMetaFileNamePattern should be in all capital letters, I'd prefer name smth like PRE_GENERATION_STAMP_META_FILE_PATTERN linkBlocks() does not need newLV parameter. Spelling "currect"

Raghu Angadi added a comment - 16/Jul/08 04:05

Thanks for the review Konstantin.

Attached patches for trunk and 0.18 has the suggested changes and removes GenStamp distributed upgrade.

The patch is large mainly because it removes around 1000 lines. 3 files are deleted in trunk and one in 0.18.

Raghu Angadi added a comment - 16/Jul/08 04:05 Thanks for the review Konstantin. Attached patches for trunk and 0.18 has the suggested changes and removes GenStamp distributed upgrade. The patch is large mainly because it removes around 1000 lines. 3 files are deleted in trunk and one in 0.18.

Hadoop QA added a comment - 17/Jul/08 06:45

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12386129/HADOOP-3677-trunk.patch
against trunk revision 677470.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/console

This message is automatically generated.

Hadoop QA added a comment - 17/Jul/08 06:45 -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12386129/HADOOP-3677-trunk.patch against trunk revision 677470. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2885/console This message is automatically generated.

Raghu Angadi added a comment - 17/Jul/08 18:25

I just committed this.

Raghu Angadi added a comment - 17/Jul/08 18:25 I just committed this.

Hudson added a comment - 22/Aug/08 12:34

Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/)

Hudson added a comment - 22/Aug/08 12:34 Integrated in Hadoop-trunk #581 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/581/ )

People

Assignee:: Raghu Angadi

Reporter:: Konstantin Shvachko

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 01/Jul/08 19:27

Updated:: 08/Jul/09 16:43

Resolved:: 17/Jul/08 18:25

Hadoop Common

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates