Thanks Suresh for the comments.
h2191_20110726.patch: addressed everything except for #2.
1. Is this correct? 10000 * heartbeatIntervalSeconds Why 10000? Should you be multiplying it by 1000 when getting it from conf and not 10000 *?
It is correct. In the original code, we have
- long heartbeatInterval = conf.getLong(
- DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_DEFAULT) * 1000;
this.heartbeatRecheckInterval = conf.getInt(
DFSConfigKeys.DFS_NAMENODE_HEARTBEAT_RECHECK_INTERVAL_DEFAULT); - this.heartbeatExpireInterval = 2 * heartbeatRecheckInterval +
- 10 * heartbeatInterval;
So, 10000 = 10*1000 is correct. I changed heartbeatInterval (in ms) to heartbeatInterval to heartbeatIntervalSeconds since we will convert heartbeatInterval back to seconds in calculating blockInvalidateLimit below.
- this.blockInvalidateLimit = Math.max(this.blockInvalidateLimit,
2. blockManager.getDatanodeManager().getDatanode() - we could add a method blockManager.getDataNode() - I prefer this instead of exposing DatanodeManager to FSNamesystem.
FSNamesystem currently requires quite a few methods in DatanodeMnaager, e.g. getNetworkTopology(), registerDatanode(..), etc. The getDatanode(..) methods are only two of those methods. Let's think about how to change the APIs after the code separation is done. Okay?
3. Earlier getDatanode() used to lock using datanodeMap, do we need that with your patch? How is it synchronized now? Also datanodeMap access may not be synchronized correctly - for example datanodeDump synchronizes it but other instances such as DatanodeManager#getDatanode() does not.
I think you are talking about FSNamesystem.heartbeatCheck(). I incorrectly removed the synchronization. Fixed it.
4. In a future patch FSNamesystem#heartbeats and its synchronization should move to blockmanager
5. Please change javadoc of getDatanode() as throws UnregisteredNodeException.
6. I am sure you have already planned for removing FSNamesystem#blockInvalidateLimit in future patch.
7. How is DatanodeManager#getDatanodeCyclicIteration() synchronized? It cannot be synchronized using FSNamesystem.writeLock() as done currently?
It should accquire the write lock of DatanodeManager when we have read-write lock in DatanodeManager in the future.