Removed NamespaceDedupe and added it as a processor "NameDistribution" to oiv tool.
Took care of all the comments from Konstantin (thanks!).
I would consider using NameCache<byte> instead of NameCache<ByteArray>....
Cache inserts byte into HashMap. This requires wrapping byte in another class to provide hashCode() and equals()
My main concern is, that the threshold is 10. This means there will a lot of names in the cache...
For the fsimage I am working with, threshold of 10 results in addition of 10% of the files names to the cache. See the analysis I have posted.
For each name cached, a HashMap.Entry takes 48 bytes. With threshold 10, space equivalent 9 byte arrays is saved. This is 9 * (24+bytes in array) = 216+9*(bytes in array) bytes. This is significant savings, compared to the cost of HashMap.Entry.
I have made the threshold configurable with a hidden option "dfs.namenode.name.cache.threshold". This could be used to run tests to see if we can decrease the threshold further.