Some comments on proposal V1.
It seems like the IV would be transparent to RFile, it would just be encryption header information associated with a block. Just like each gzip block probably has some header. From RFiles perspective it just needs to be able to read and write blocks of data. When the encryption codec is not used, there is no per block IV. Does this sound correct? Taking this a step further, should encryption be pushed into BCFile? Currently RFile has no concept of compression, is just reads and write blocks of data to BCFile. BCFile handles compression and stores compression metadata like what codec to use for reading. Even RFiles own root meta block is stored as a regular BCFile meta block and compressed like everything else. Seems like modifying BCfile rather than RFile may be easier. I have already modified BCfile to support multi level indexes in 1.4. BCFile was copied because it was package private, but was not modified for a long time.
Why is another interface needed? Why not use org.apache.hadoop.io.compress.CompressionCodec? Not saying we should or should not do this, but would like to hear your thoughts since you have looked into this. I see some things in the design doc that I suspect influence this decision, like needed to set Key and IV. While thinking about this I remembered the BigTable paper mentioned using two compression codecs in series.
In the past we have not supported rolling upgrade from 1.x to 1.(x+1). Would only need to consider this if 1.6 supported it. Changes in the file format would be a small part of a larger effort to support rolling upgrade. Releases to date could always read a file produced by any previous version. So Accumulo 1.4 can read rfiles produced by any previous version of Accumulo.
Is there any concern with storing unencrypted blocks in memory? The code currently caches uncompressed blocks (but still serialzed with RFile encoding) in memory. Would this be a concern in case these cached block are swapped out? Would we want to keep blocks encrypted in the cache and decrypt only as needed?