I think XML is fine. XML parsing is done at the document level, so we can safely find out or ignore the existence of the extra parameter and not worry about the size of data. I tried calling getFileChecksum() over Hftp between a patched 0.23 cluster and a 1.0.x cluster, and it worked fine both ways.
The change you suggested does not solve the whole problem. The magic number is like a simple binary length field. Presence/absence of it tells you how much data you need to read. So the read-side of patched version works even when reading from an unpatched version. But it's not true for the other way around. The unpatched version will always leave something unread in the stream. XML is nice in that it inherently has begin and end marker and not sensitive to size changes.
Since JsonUtil depends on this serialization/deserialization methods I don't think it cannot obtain the bidirectional compatibility by modifying only one side. If it had used XML and did not do the length check, it would have no such problem. Fully Json-ized approach could have worked as well.
One approach I can think of is to leave the current readFields()/write() methods unchanged. I think only WebHdfs is using it and if that is true, we can make WebHdfs actually send and receive everything in JSON format and keep the current "bytes" Json field as is. When it does not find the "new" fields from an old data source, it can do the old deserialization on "bytes". Similarly, it should send everything in individual JSON field as well as the old serialzed "bytes".
It may be better to move the JSON util methods to MD5MD5CRC32FileChecksum.java, since they will have to know the internals of MD5MD5CRC32FileChecksum.