1) Can we keep BlockManager#corruptReplicas private?
After the replication code is moved the block manager, we can change it back to private.
2) Is there a jira created to not to use FSNamesystem, FSNamesystem#datanodeMap FSNamesystem#heartbeats for locking in block manager? If not can you please create one?
I am current simply moving the code but not changing the logic (unless for some obvious cases, e.g. use for-each statement instead of iterator.)
For improving locking, I have thought about it. How about introduce a ReadWriteLock interface and then FSNamesystem, BlockManager and DatanodeManager implement it? The methods in ReadWriteLock are very similar to the existing methods in FSNamesystem. I have not created a JIRA for it yet.
3) Should we track with a jira removing references to NameNode.stateChangeLog in blockmanagement package to appropritate logger from blockmanagement package?
We probably should introduce a stateChangeLog in BlockManager.
4) Moved code resolveNetworkLocation() no longer has assert hasWriteLock(). What is the locking mechanism for this code?
FSNamesystem.registerDatanode(..), which has the write-lock, calls DatanodeManager.registerDatanode(..), which is the only method calling resolveNetworkLocation(..). I think the assert statement is not useful. No? We may add the assert statement back when we have read-write lock in DatanodeManager. Again, I am not changing any logic.
5) FSNamesystem#isDatanodeDead(), FSNamesystem#updateStats(), FSNamesystem#blockTokenSecretManager, FSNamesystem#datanodeMap, FSNamesystem#heartbeats should move to blockmanagement package.
I am moving the code step-by-step. After this patch, we are ready to move FSNamesystem#datanodeMap and the corresponding methods. I have
HDFS-2108 and HDFS-2112 for moving the heartbeat and replication code, respectively. I will see if moving blockTokenSecretManager could be done with replication.
In jiras moving the code, if the moved code remains the same, it would be easier to review. Case in point - getDatanodeListForReport()
My bad. I usually don't change any logic but there are too many redundant code there. I tried to simpify it.
> Why are doing adding these two sizes final List<DatanodeDescriptor> nodes = new ArrayList<DatanodeDescriptor>(namesystem.datanodeMap.size() + (mustList == null? 0: mustList.size()));
This is existing code. I think the author tried to prevent array resizing.
> Changing mustList from Map to ArrayList is going to slow down removes.
Good point, I should use HashSet.
7) checkDecommissionStateInternal() no longer checks for hasWriteLock()?
Same as (4), it is an assert statement. Do you think we need it?
8) DatanodeManager should not have namesystem reference?
The main reason is for accessing namesystem.datanodeMap. We should remove namesystem once everything is moved to blockmanagement package.
9) In some tests where you have moved from FSNamesystem methods calls to BlockManager method calls, how is the locking that was done in FSNamesystem call done with this change?
It is because the same FSNamesystem is moved to BlockManager. I should not have changed any logic.