Attached is a patch against current branch-0.20 which resolves the protocol incompatibility of the
HDFS-101/ HDFS-793 pair. Since this is tricky code, I'll try to summarize the patch in detail:
HDFS-793, PipelineAck's wire format includes a new element which is the number of status replies to follow. This is the central incompatibility. So, in this patch, I removed that field and reset the version number back to the original 14 from old branch-0.20. To know how many status replies to read, it now takes the downstream pipeline depth as a constructor parameter. This is used only for reading, and otherwise is -1 (it's an error to call readFields if it's not been set)
Since the number of replies in a pipeline ack is no longer dynamic, I removed the getNumOfReplies call as well.
When reading an ack, I check for the HEARTBEAT message, and in that case don't read any replies. Otherwise I expect a reply from each downstream datanode. For review: should readFields handle the case of a sequence number equal to -2? Best I can tell, the current code never sends such a sequence number, and if it does it is an error. It may make sense to check for it and throw an IOException in the case of a negative seqno that is not HEARTBEAT_SEQNO. Opinions appreciated.
In DFSClient I added a DEBUG level printout for the contents of the pipeline. This was useful to me as I was testing, to ensure that I tested killing each of the nodes in the pipeline in the intended order.
In BlockReciever, I added back the "continue" during HEARTBEAT processing. I believe this was an omission in the earlier patch - best I can tell, without the continue, it currently sends a spurious "seqno=-2" ack after each heartbeat. With the continue call, it circles around the loop correctly to wait for the next ack.
For review: I put a TODO for the case where BlockReceiver receives a seqno = -2. I currently believe that any negative sequence number that is not HEART_BEAT is an error and should throw an IOException (eg we got our reads misaligned).
When constructing the ack message for a failed mirror, since every ack must have the same number of replies, I send SUCCESS followed by N errors, where N is the number of downstream targets. The client's behavior is to eject the first ERROR node, so the presence of ERROR status further downstream is unimportant - in truth they are semantically UNKNOWN, but no such status code exists. For review:
HDFS-793 reversed the order of the loop on DFSClient.java:2431 to locate the last DN with ERROR status. I had to reverse this back to the original loop order for this patch since the replies look like SUCCESS, ERROR, ERROR in the case that DN 2 dies.
In terms of testing, I performed the following:
- Start up 3 node distributed cluster using patched servers
- With an unpatched client, began uploading a file. Killed each node in the pipeline (first, second, last) and ensured that the correct datanode was ejected.
- With a patched client and patched server, ran the same test.
- With patched client and unpatched server, ensured that file uploads work properly. I did not test killing the unpatched server nodes here - I can do so if necessary, but was using a shared cluster for this test.
In all cases, the file upload lasted more than 30 seconds, so heartbeats were tested.