we could add an optimize(long maxSegmentSize)
This I think would be useful anyway, and kind of required if we introduce the proposed merge policy. Otherwise, if someone's code calls optimize (w/ or w/o num segments limit), those large segments will be optimized as well.
except if it accumulates too many deletes (as a percentage of docs) then it can be compacted and new segments merged into it?
If one would call expungeDeletes, and that segment will go below the max size, then it will be eligible for merging, right? But I have a question here, and it may be that I'm missing something in the merge process. Say I have the following segments, each at 4 GB (the limit), except D:
A (docs 0-99), B (docs 100-230), C (docs 231-450) and D (docs 451-470). Then A accumulates 50 deletes. On one hand, we'd want it to be merged, but if we want that, we have to merge B and C either, right? We cannot merge A w/ D, because the doc IDs need to be in increasing order and retain the order they were added to the index?
So will the merge policy detect that? I think that it should and the way to work around that is to ensure that the first segment which is below the limit, triggers the merge of all following segments (in doc ID order), regardless of their size?
I don't know if your patch already takes care of this case, and whether my understanding is correct, so if you already handle it that way (or some other way), then that's fine.