I spent some time on this. It's quite fascinating: the number of collisions for the default probing is smaller than:
a) linear probing with murmurhash mix of the original hash
b) linear probing without murmurhash mix (start from raw hash only).
Curiously, the number of collisions for (b) is smaller than for (a) – this could be explained if we assume bits are spread evently throughout the entire 32-bit range after murmurhash, so after masking to table size there should be more collisions on lower bits compared to a raw hash (this would have more collisions on upper bits and fewer on lower bits because it is multiplicative... or at least I think so).
Anyway, I tried many different versions and I don't see any significant difference in favor of linear probing here. Measured the GC overhead during my tests too, but it is not the primary factor contributing to the total cost of constructing the FST (about 3-5% of the total time, running in parallel, typically).