I think the problem is "one datanode with replication 3". What should be the user expectation? It seems that users won't be happy if we do not allow append. However, if we allow appending to a single replica and the replica become corrupted, then it is possible to have data loss - I can imagine in some extreme cases that a user is appending to a single replica slowly, admin add more datanodes later on but the block won't be replicated since the file is not closed, and then the datanode with the single replica fails. Is this case acceptable to you?
> So from the view of user, the first append succeed while the second fail, is that a good idea?
The distinction is whether there is pre-append data. There are pre-append data in the replica in the second append. The pre-append data was in a closed file and if the datanode fails during append, it could have data loss. However, in the first append, there is no pre-append data. If the append fails and the new replica is lost, it is a sort of okay since only the new data is lost.
The add-datanode feature of is to prevent data loss on pre-append data. Users (or admin) could turn it off as mentioned in
HDFS-3091. I think we may improve the error message. Is it good enough? Or any suggestion?