Lucene - Core
  1. Lucene - Core
  2. LUCENE-3292

IOContext should be part of the SegmentReader cache key

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Once IOContext (LUCENE-2793) is landed the IOContext should be part of the key used to cache that reader in the pool

      1. LUCENE-3292.patch
        59 kB
        Michael McCandless
      2. LUCENE-3292.patch
        12 kB
        Varun Thacker
      3. LUCENE-3292.patch
        14 kB
        Varun Thacker
      4. LUCENE-3292.patch
        11 kB
        Varun Thacker
      5. LUCENE-3292.patch
        11 kB
        Varun Thacker

        Issue Links

          Activity

          Hide
          Varun Thacker added a comment -

          I am not quite sure on how to start with this.

          In SegmentReader#get something like this is required :

              if (readOnly) {
                assert context != IOContext.DEFAULT;
                //assert context.context == IOContext.Context.READ;
                // Using the second assert checks for both READ and READONCE
              }
          

          And what do I need to do in IndexWriter.ReaderPool#get so that context should be part of the key used to cache that reader in the pool?

          Show
          Varun Thacker added a comment - I am not quite sure on how to start with this. In SegmentReader#get something like this is required : if (readOnly) { assert context != IOContext.DEFAULT; //assert context.context == IOContext.Context.READ; // Using the second assert checks for both READ and READONCE } And what do I need to do in IndexWriter.ReaderPool#get so that context should be part of the key used to cache that reader in the pool?
          Hide
          Michael McCandless added a comment -

          Right now the ReaderPool.readerMap is a Map<SegmentInfo,SegmentReader>.

          I think we just need to change that to Map<SegmentInfoAndIOContext,SegmentReader> instead, where SegmentInfoAndIOContext is a new struct holding SegmentInfo and IOContext.Context and implementing hashCode/equals by delegating to the SegmentInfo and IOContext.Context.

          Show
          Michael McCandless added a comment - Right now the ReaderPool.readerMap is a Map<SegmentInfo,SegmentReader>. I think we just need to change that to Map<SegmentInfoAndIOContext,SegmentReader> instead, where SegmentInfoAndIOContext is a new struct holding SegmentInfo and IOContext.Context and implementing hashCode/equals by delegating to the SegmentInfo and IOContext.Context.
          Hide
          Varun Thacker added a comment -

          Initial patch. There might have been a few methods inside ReaderPool where I have added a IOContext which might not be correct.

          Show
          Varun Thacker added a comment - Initial patch. There might have been a few methods inside ReaderPool where I have added a IOContext which might not be correct.
          Hide
          Varun Thacker added a comment -

          I am still not sure on 2 things:

          In SegmentReader#get the assert statement.

          In IW.ReaderPool.SegmentCacheKey the equals() method - Does it need to call .equals on the SegmetnInfo and context? Since we are just including IOContext.Context only won't this be wrong ?

          Show
          Varun Thacker added a comment - I am still not sure on 2 things: In SegmentReader#get the assert statement. In IW.ReaderPool.SegmentCacheKey the equals() method - Does it need to call .equals on the SegmetnInfo and context? Since we are just including IOContext.Context only won't this be wrong ?
          Hide
          Michael McCandless added a comment -

          SegmentCacheKey.equals needs { } around the single-statement ifs.

          Make sure you remove the TODO in ReaderPool.get since this issue is fixing it.

          I am still not sure on 2 things:

          In SegmentReader#get the assert statement.

          Hmm... why are we adding this assert again?

          In IW.ReaderPool.SegmentCacheKey the equals() method - Does it need to call .equals on the SegmetnInfo and context? Since we are just including IOContext.Context only won't this be wrong ?

          Yes, it should && the .equals of its two member. It's not wrong – we only must ensure that the SegmentReader for merging is shared during merging, and for searching during searching; we don't want t open a new SegmentReader for every new IOContext.

          Actually: why do we have IOContext.equals/hashCode...? Who relies on these methods today?

          ReaderPool.commit should pass IOContext.READ.

          I think we should add a ReaderPool.getIfExists that does not take an IOContext and tries both READ and MERGE and returns the SegmentReader if it's found; likewise for release.

          Show
          Michael McCandless added a comment - SegmentCacheKey.equals needs { } around the single-statement ifs. Make sure you remove the TODO in ReaderPool.get since this issue is fixing it. I am still not sure on 2 things: In SegmentReader#get the assert statement. Hmm... why are we adding this assert again? In IW.ReaderPool.SegmentCacheKey the equals() method - Does it need to call .equals on the SegmetnInfo and context? Since we are just including IOContext.Context only won't this be wrong ? Yes, it should && the .equals of its two member. It's not wrong – we only must ensure that the SegmentReader for merging is shared during merging, and for searching during searching; we don't want t open a new SegmentReader for every new IOContext. Actually: why do we have IOContext.equals/hashCode...? Who relies on these methods today? ReaderPool.commit should pass IOContext.READ. I think we should add a ReaderPool.getIfExists that does not take an IOContext and tries both READ and MERGE and returns the SegmentReader if it's found; likewise for release.
          Hide
          Varun Thacker added a comment -

          Made the changes suggested.

          Show
          Varun Thacker added a comment - Made the changes suggested.
          Hide
          Simon Willnauer added a comment -

          Hey varun, patch looks good here are some comments:

          • SegmentCacheKey should only have final members
          • SegmentCacheKey should be final and the ctor visibility should be package private
          • you can omit "this" and "== true" in SegmentCacheKey#equals and simply return the comparison result
          • in IndexWriter#drop(SegmentInfo, IOContext) you can simply call if ((sr = readerMap.remove(new SegmentCacheKey(info, context))) != null) that way you only do the lookup only once instead of getting the instance first and then look it up a second time to remove it. this applies to more places in that method as far as I can see.
          • I try to understand what you are doing in getIfExists(SegmentInfo info) it seems you are probing if there is a reader for this segment with two different IOContext. Maybe it would make more sense to pass in the context here too OR don't put the IOContext in the cache key and hold two distinct maps one for read one for merge? The try catch here doesn't make sense to me at all.
          Show
          Simon Willnauer added a comment - Hey varun, patch looks good here are some comments: SegmentCacheKey should only have final members SegmentCacheKey should be final and the ctor visibility should be package private you can omit "this" and "== true" in SegmentCacheKey#equals and simply return the comparison result in IndexWriter#drop(SegmentInfo, IOContext) you can simply call if ((sr = readerMap.remove(new SegmentCacheKey(info, context))) != null) that way you only do the lookup only once instead of getting the instance first and then look it up a second time to remove it. this applies to more places in that method as far as I can see. I try to understand what you are doing in getIfExists(SegmentInfo info) it seems you are probing if there is a reader for this segment with two different IOContext. Maybe it would make more sense to pass in the context here too OR don't put the IOContext in the cache key and hold two distinct maps one for read one for merge? The try catch here doesn't make sense to me at all.
          Hide
          Varun Thacker added a comment -

          I made changes to it as suggested. I get some assert errors and a few others and will work on it now.

          Show
          Varun Thacker added a comment - I made changes to it as suggested. I get some assert errors and a few others and will work on it now.
          Hide
          Michael McCandless added a comment -

          Attached patch, passing all tests. I think it's ready to commit.

          This was trickier than I thought, because we had to take care to
          ensure the deletions are applied only against readers pulled in a READ
          context, and then pull the latest deletions from the READ context when
          merging. So I started with Varun's patch and iterated from there...

          Also, there was a great simplification that fell out of this: we no
          longer need that scary SR.loadTermsIndex method to load a terms index
          after SR had already been created. We only needed this for the case
          where an SR opened for merging was then shared with an NRT reader, but
          this is no longer possible.

          Show
          Michael McCandless added a comment - Attached patch, passing all tests. I think it's ready to commit. This was trickier than I thought, because we had to take care to ensure the deletions are applied only against readers pulled in a READ context, and then pull the latest deletions from the READ context when merging. So I started with Varun's patch and iterated from there... Also, there was a great simplification that fell out of this: we no longer need that scary SR.loadTermsIndex method to load a terms index after SR had already been created. We only needed this for the case where an SR opened for merging was then shared with an NRT reader, but this is no longer possible.
          Hide
          Michael McCandless added a comment -

          Thanks Varun!

          Show
          Michael McCandless added a comment - Thanks Varun!

            People

            • Assignee:
              Varun Thacker
              Reporter:
              Simon Willnauer
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development