LZO compression codec is not supported in Hadoop standard package. So the compression algorithm has to be configurable.
If we compress the entire image file, the challenge is to decide where to put the compression algorithm information.
Dhruba suggested to store this information in file VERSION. This idea is neat. The only problem is that now saving the fsimage needs to touch two files and its hard to guarantee atomicity.
Another solution is to use a suffix to the image file name to indicate the compression algorithm. The problem with this is that now the image file no longer has a unique name so it is possible one storage directory has multiple fsimages. How do we handle this?
After discussions back and forth, I am kind of thinking to use the approach that I originally proposed, changing the binary format. Therefore we could store the compression algorithm information in the fsimage header. In this way, we don't need to deal with any of the complexity that compressing the entire image file presents.
What do the community think?