Future optimizations could include bulk copying multiple documents at once (all ranges between deleted docs). The speedup would probably be greatest for small docs, but I'm not sure if it would be worth it or not.
Ooh, I like that idea! I'll explore that.
More controversial: maybe even expand the number of docs that can be bulk copied by not bothering removing deleted docs if it's some very small number (unless it's an optimize). This is probably not worth it.
That's a neat idea too but I agree likely not worth it.
Another idea: we can almost just concatenate the posting lists
(frq/prx) for each term, because they are "delta coded" (we write the
delta between docIDs). The only catch is you have to "stitch up" the
boundary: you have to read the docID from the start of the next
segment, write the delta-code, then you can copy the remaining bytes.
I think this could be a big win especially when merging larger