This is an important one for riding over ha nn topology changes (as per Chunhui). Was seen on a cluster today.
As I reported in
HBASE-7385, we've also seen this in NN HA tests.
IMHO, this particular fix is only important if we have fixed all other write attempts for HDFS.
We have seen some other edge case, where NN dies just before returning the RPC response for create file, next retry from the DFS client fails due to file already exists exception. I think I've logged it somewhere. Regardless, I think, fixing the memstore flush is important, since it causes RS to abort on fail.
Should we commit it, and if tests start failing, fix them later?