For append, it makes a lot of sense to keep using the existing checksum type. What is the use case for using a different checksum type?
I don't think it makes sense either, but that was the design decision made in
HDFS-2130. There might have been some use cases for this, so I tried to support it while making the default to not allow it. If you feel that this should be the behavior with no configurable option, I will be happy to update the patch accordingly.
What do you think we should do for concat()? It is supposed to be quick namenode only operation, so I don't feel comfortable about inserting code to check the checksums of input files.
Suppose the last block is half written with CRC32 in a close file. Then, the file is re-opened for append with CRC32C. Would the block has two checksum types, i.e. first half is CRC32 and the second half is CRC32C?
No. Datanode will continue to use the same checksum parameters of the existing partial block for writing, independent of what client is sending with data. Input data integrity check is still done, of course.
Suppose a close file is already using more than one checksum type. Then, the file is re-opened for append with dfs.client.append.allow-different-checksum == false. Which checksum should it use? Or should it fail?
I don't think we can do much for existing files. Users can detect it with getFileChecksum(), which will show DataChecksum.Type.MIXED as its checksum type. For these files, checksum will still be used for block -level integrity check and nothing will break until something like distcp tries to compare FileChecksums after copying.