[HDFS-9949] Add a test case to ensure that the DataNode does not regenerate its UUID when a storage directory is cleared - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Test
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.6.0
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: None
Labels:
None

Target Version/s:

2.8.0, 2.9.0

Description

In the following scenario, in releases without ~~HDFS-8211~~, the DN may regenerate its UUIDs unintentionally.

0. Consider a DN with two disks /data1/dfs/dn,/data2/dfs/dn
1. Stop DN
2. Unmount the second disk, /data2/dfs/dn
3. Create (in the scenario, this was an accident) /data2/dfs/dn on the root path
4. Start DN
5. DN now considers /data2/dfs/dn empty so formats it, but during the format it uses datanode.getDatanodeUuid() which is null until register() is called.
6. As a result, after the directory loading, datanode.checkDatanodUuid() gets called with successful condition, and it causes a new generation of UUID which is written to all disks /data1/dfs/dn/current/VERSION and /data2/dfs/dn/current/VERSION.
7. Stop DN (in the scenario, this was when the mistake of unmounted disk was realised)
8. Mount the second disk back again /data2/dfs/dn, causing the VERSION file to be the original one again on it (mounting masks the root path that we last generated upon).
9. DN fails to start up cause it finds mismatched UUID between the two disks, with an error similar to:

WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/2/dfs/dn is in an inconsistent state: Root /data/2/dfs/dn: DatanodeUuid=fe3a848f-beb8-4fcb-9581-c6fb1c701cc4, does not match 8ea9493c-7097-4ee3-96a3-0cc4dfc1d6ac from other StorageDirectory.

The DN should not generate a new UUID if one of the storage disks already have the older one.

~~HDFS-8211~~ unintentionally fixes this by changing the datanode.getDatanodeUuid() function to rely on the DataStorage representation of the UUID vs. the DatanodeID object which only gets available (non-null) after the registration.

It'd still be good to add a direct test case to the above scenario that passes on trunk and branch-2, but fails on branch-2.7 and lower, so we can catch a regression around this in future.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HDFS-9949.000.branch-2.7.not-for-commit.patch
13/Mar/16 02:11
3 kB
Harsh J
HDFS-9949.000.patch
13/Mar/16 02:12
3 kB
Harsh J
HDFS-9949.001.patch
17/Mar/16 02:44
3 kB
Harsh J

Issue Links

relates to

HDFS-8211 DataNode UUID is always null in the JMX counter

Resolved

Activity

People

Assignee:: Harsh J

Reporter:: Harsh J

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 13/Mar/16 02:09

Updated:: 30/Aug/16 01:18

Resolved:: 17/Mar/16 17:42