I ran performance tests with an fsimage/edits pair I had from a real life cluster. The fsimage is about ~2G and has 12.5M files, and the edit log is exactly 2GB (I truncated it with dd to that length). I ran the NN with the following JVM options: -Xms14g -Xmx14g -XX:+UseCompressedOops.
With Parallel (default) GC:
I loaded the edit log 3 times each with the patch and without the patch from a local SATA disk.
Without the patch, the logs loaded in 84 seconds (consistent across the 3 runs). With the patch, it loaded in 87s, consistent across the three runs.
With CMS GC:
I then added the JVM option: -XX:+UseConcMarkSweepGC, since that's more likely the GC in use on most large clusters.
With the patch: Loaded in 86 seconds and incurred 213 young generation collections while loading the edit log, which added up to a total of 2.208 seconds in young gen GC.
Without the patch: 84 seconds, 211 young gen GCs, adding up to 2.174 seconds.
The patch seems to have a very marginal impact on amount of time spent in GC, which makes sense since the objects are very short-lived and young-generation GC time is proportional to live object size, not garbage size. The patch seems to have about a 3-4% negative impact on overall wall clock time of loading the log.
Do you guys think this is acceptable? In most of the clusters I see, edit logs tend to be much smaller than this, and startup time is dominated by loading the image and collecting block reports, not edits replay. So, I tend to think the improved code cleanliness of this patch is worth the perf hit.