Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2355

Refactor Directory/Multi/SegmentReader creation/reopening/cloning/closing

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      *Reader lifecycle evolved over time to become some heavily tangled mess. It's hard to understand what's going on there, it's even harder to add some fields/logic while ensuring that all possible code paths preserve these fields/interact with the logic properly. While some of said mess is justified by the task at hand, a big part is just badly done copypaste and can be removed.

      I am currently refactoring this and intended to open an issue with a working patch, but the task winded up somewhat bigger than I expected, so I'm opening it earlier to track stuff encountered/changed/fixed.
      The list is by no means exhaustive.

      • an iteration to create SRs is copypasted several times, one of them (IW) with wrong iteration bound
      • it is also overly complex and can be folded for create/reopen cases
      • readers sent to IndexReaderWarmer are termindexless/docstoreless on some occasions
      • it is possible to clone() your way to readwrite NRT reader
      • IndexDeletionPolicy is not always preserved through clones/reopens
      • cloned readers share CoreReaders and, consequently, updated termsIndex/docStores
      • threadlocal versions of fieldsReader/termsVector are bound to SR, not CoreReaders and thus are recreated on clone/reopen
      • double-initialization for some fields (someone got lost and did this to be sure I guess), stupid assert checks ( qwe = new(); assert qwe != null )
      • SR is not always recreated when compound status of underlying segment changes
      • deleting already deleted doc marks deletions dirty and rewrites them
      • lots of synchronization is done around Reader, while it can be narrowed down to norms/deletions/whatever

      I did some structural modifications:

      • CompositeReader extracts common code from DirectoryReader and MultiReader (complete)
      • ReadonlyDirectoryReader and ReadonlySegmentReader are dead, MutableD/SReaders are introduced and carry all modification logic/fields (DR complete, SR in progress)
      • WriterBackedReader encapsulates NRT reader logic (complete)
      • CoreReaders split into CoreReaders, DocStores, TermInfos. All of these are immutable and SR is cloned when you need to change its mode (in progress)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                earwin Earwin Burrfoot
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated: