Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2679

IndexWriter.deleteDocuments should have option to not apply to docs indexed in the current IW session

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • 4.9, 6.0
    • None
    • None
    • New

    Description

      In LUCENE-2655 we are struggling with how to handle buffered deletes,
      with the new per-thread RAM buffers (DWPT).

      But, the only reason why we must maintain a map of del term -> current
      docID (or sequence ID) is to correctly handle the interleaved adds &
      deletes case.

      However, I suspect that for many apps that interleaving never happens.
      Ie, most apps delete only docs from before the last commit or NRT
      reopen. For such apps, we don't need a Map... we just need a Set of
      all del terms to apply to past segments but not to the currently
      buffered docs.

      And, importantly, with LUCENE-2655, this would be a single Set, not
      one per DWPT. It should be a a healthy RAM reduction on buffered
      deletes, and should make the deletes call faster (add to one set instead of
      N maps).

      We of course must still support the interleaved case, and I think it
      should be the default, but I think we should provide the option for
      the common-case apps to take advantage of much less RAM usage.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: