Thanks for the update.
We should call cluster.setDataNodeDead(..) to remove it from cluster map.
1. Actually it's wrong. My mistake. This line is unnecessary.
2. Suggestion, you can enable log to make debug easier.
sleep 3 seconds instead of 1 seconds
It's not exactly the intention of the old logic. I tried sleep(1), and I found
Node /rack1/127.0.0.1:44392 [
Datanode 127.0.0.1:44392 is not chosen since no good storage to place the block .
That's because the first block report is not finished, so DatanodeDescriptor#storageMap is empty. I tried cluster.waitFirstBRCompleted(); but there is race condition with the slowwriters.
So I think we can:
1. start 5 writers, and sleep shortly to make them all started.
2. start 2 new DNs, waitFirstBRCompleted, and stop an old DN. (We don't need to call cluster.setDataNodeDead())
3. start 5 new writers.
As the comment says
// Let slow writers write something.
// Some of them are too slow and will be not yet started.
In this way, we don't change the logic of the test.
4. This line is not needed.