Affects Version/s: 0.2.0
Fix Version/s: 0.3.0
~30 node cluster
DFS files are still rotting.
I suspect that there's a problem with block accounting/detecting identical hosts in the namenode. I have 30 physical nodes, with various numbers of local disks, meaning that my current 'bin/hadoop dfs -report" shows 80 nodes after a full restart. However, when I discovered the problem (which resulted in losing about 500gb worth of temporary data because of missing blocks in some of the larger chunks) -report showed 96 nodes. I suspect somehow there were extra datanodes running against the same paths, and that the namenode was counting those as replicated instances, which then showed up over-replicated, and one of them was told to delete its local block, leading to the block actually getting lost.
I will debug it more the next time the situation arises. This is at least the 5th time I've had a large amount of file data "rot" in DFS since January.
|Field||Original Value||New Value|
|Priority||Major [ 3 ]||Critical [ 2 ]|
|Affects Version/s||0.2 [ 12310813 ]|
|Summary||Files still rotting in DFS of latest Hadoop||don't permit two datanodes to run from same dfs.data.dir|
|Fix Version/s||0.3 [ 12310930 ]|
|Assignee||Konstantin Shvachko [ shv ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Workflow||jira [ 12353541 ]||no reopen closed [ 12373066 ]|
|Workflow||no reopen closed [ 12373066 ]||no-reopen-closed [ 12373402 ]|
|Workflow||no-reopen-closed [ 12373402 ]||no-reopen-closed, patch-avail [ 12377712 ]|
|Component/s||dfs [ 12310710 ]|