DataNodeCluster is a great tool to simulate a large scale DFS cluster using a small set of machines. A few suggestions to improve this tool:
- DataNodeCluster uses MiniDFSCluster#startDataNode to start multiple instances of DataNode on one machine. MiniDFSCluster sets DataNode's address to be 127.0.0.1. We should allow to set its address to 0.0.0.0 so DataNodes in different machines could communicate.
- Currently the size of the blocks injected to DataNode and created in CreatedEditsLog is hardcoded as 10. It would be more convenient if this could be configurable. Also we need to make sure that both use the same block size.
- If the replication factor of blocks is larger than 1, currently a DataNode in DataNodeCluster will be injected blocks multiple times and therefore it sends block reports to NameNode multiple times. Initial block reports contain only a portion of its blocks and therefore may cause unnecessary block replications. It would be cleaner if only one block report with all its blocks is sent.
|Transition||Time In Source Status||Execution Times||Last Executer||Last Execution Date|
|11h 8m||1||Ravi Phulari||20/Aug/09 18:25|
|149d 12h 38m||2||Ravi Phulari||20/Aug/09 18:25|
|12d 6h 47m||1||Chris Douglas||02/Sep/09 01:13|
|356d 19h 23m||1||Tom White||24/Aug/10 20:36|
Tom White made changes -
|Status||Resolved [ 5 ]||Closed [ 6 ]|
Chris Douglas made changes -
|Status||Patch Available [ 10002 ]||Resolved [ 5 ]|
|Resolution||Duplicate [ 3 ]|
|Status||Patch Available [ 10002 ]||Open [ 1 ]|
|Attachment||HADOOP-5556.patch [ 12417111 ]|
Hairong Kuang made changes -
|Assignee||Hairong Kuang [ hairong ]|