Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8255

Can we make index sorting work for soft deletes

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I phrased this as a question since it's mainly a discussion. I spoke to Robert Muir on a couple of occasions about making index sorting work for soft deletes. The issue that prevents this is that soft deletes use updateable DV to mark docs as deleted. This basically means that a sorted segment is not guaranteed to be sorted if it has received any updates. This also means that sorting such a segment on merge has a significant overhead. (I hope Jim Ferenczi can shed some light on it how much we would have to expect). We also need to add some special casing since we use "merge sorting" and can't go backwards in doc ID which would be violated if a segment received updates. (cc Adrien Grand)

      The main purpose of doing this is that "soft deleted" documents would either be at the end or in the beginning of the segment such that compression is better if these docs have larger retention policies. 

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              simonw Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: