One of the changes in Kudu 1.4 is a more comprehensive repair functionality in log block manager startup. Amongst other things this includes a heuristic to detect whether an LBM container consumes more disk space than it should, based on the live blocks in the container. If the heuristic fires, the LBM reclaims the extra disk space by truncating the end of the container and repunching out all of the dead blocks in the container.
We brought up Kudu 1.4 on a large production cluster running xfs and observed pathologically slow startup times. On one node, there was a three hour gap between the last bit of data directory processing and the end of LBM startup in general. This time can only be attributed to hole repunching, which is executed by the same set of thread pools that open the data directories.
Further research revealed that on xfs in el6, a hole punch via fallocate() always includes an fsync() (in the kernel), even if the underlying data was already punched out. This isn't the case with ext4, nor does it appear to be the case with xfs in more modern kernels (though this hasn't been confirmed).
xfs provides the XFS_IOC_UNRESVSP64 ioctl, which can be used to deallocate space from a file. That sounds an awful lot like hole punching, and some quick performance tests show that it doesn't incur the cost of an fsync(). We should switch over to it when punching holes on xfs. Certainly on older (i.e. el6) kernels, and potentially everywhere for simplicity's sake.