Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Move the static FieldCache.DEFAULT field instance to atomic IndexReaders, so that FieldCache insanity caused by the WeakHashMap no longer occurs.

      • Add a new method to IndexReader that by default throws an UOE:
        public FieldCache getFieldCache()
      • The SegmentReader implements this method and returns its own internal FieldCache implementation. This implementation just uses a HashMap<Entry<T>,Object>> to store entries.
      • The SlowMultiReaderWrapper implements this method as well and basically behaves the same as the current FieldCacheImpl.

      This issue won't solve the insanity that comes from inconsistent usage of a single field (for example retrieve both int[] and DocTermIndex for the same field).

      1. LUCENE-3360.patch
        24 kB
        Martijn van Groningen
      2. LUCENE-3360.patch
        78 kB
        Martijn van Groningen
      3. LUCENE-3360.patch
        79 kB
        Martijn van Groningen
      4. LUCENE-3360.patch
        96 kB
        Martijn van Groningen
      5. LUCENE-3360.patch
        95 kB
        Martijn van Groningen
      6. LUCENE-3360-3x.patch
        103 kB
        Martijn van Groningen
      7. LUCENE-3360.patch
        96 kB
        Martijn van Groningen
      8. LUCENE-3360.patch
        305 kB
        Martijn van Groningen
      9. LUCENE-3360-3x.patch
        110 kB
        Martijn van Groningen
      10. LUCENE-3360-3x.patch
        149 kB
        Martijn van Groningen

        Issue Links

          Activity

          Hide
          Martijn van Groningen added a comment -

          Most of the work will be changing the code that is using FieldCache.DEFAULT field to use the IndexReader.getFieldCache() method.I found 92 usages of the FieldCache.DEFAULT field in both production and test code.

          Show
          Martijn van Groningen added a comment - Most of the work will be changing the code that is using FieldCache.DEFAULT field to use the IndexReader.getFieldCache() method.I found 92 usages of the FieldCache.DEFAULT field in both production and test code.
          Hide
          Uwe Schindler added a comment -

          That sounds great.

          We could even backport this to Lucene 3.x and simply make a backwards fake FieldCache.DEFAULT impl that delegates to the IndexReader that is passed in to getInts() & others.

          Show
          Uwe Schindler added a comment - That sounds great. We could even backport this to Lucene 3.x and simply make a backwards fake FieldCache.DEFAULT impl that delegates to the IndexReader that is passed in to getInts() & others.
          Hide
          Uwe Schindler added a comment -

          Of course we should check the SegmentReader code to reuse FieldCache instances on reopen, when just the deleted docs changed by reopening. Also clone should simply reuse the fieldcache in the SegmentReader clone (that is what reopen does in this case).

          Show
          Uwe Schindler added a comment - Of course we should check the SegmentReader code to reuse FieldCache instances on reopen, when just the deleted docs changed by reopening. Also clone should simply reuse the fieldcache in the SegmentReader clone (that is what reopen does in this case).
          Hide
          Martijn van Groningen added a comment -

          Good idea. Backporting this to Lucene 3.x seems like a good plan if we make a fake FieldCache.DEFAULT impl that delgates to IR.
          Yes, we should reuse the fieldcache in case of a reopen. I see that reopenSegement just instantiates a new SegmentReader instance, so we can just pass the existing FieldCache instance to the new SegementReader instance.

          Show
          Martijn van Groningen added a comment - Good idea. Backporting this to Lucene 3.x seems like a good plan if we make a fake FieldCache.DEFAULT impl that delgates to IR. Yes, we should reuse the fieldcache in case of a reopen. I see that reopenSegement just instantiates a new SegmentReader instance, so we can just pass the existing FieldCache instance to the new SegementReader instance.
          Hide
          Martijn van Groningen added a comment -

          Attached initial draft patch. This is untested code and is just to see if the plan pans out.

          Maybe we should move FieldCache and FieldCacheImpl to the cache package inside the search package. Also maybe ToplevelFieldCache is a better name for FieldCacheImpl.

          Show
          Martijn van Groningen added a comment - Attached initial draft patch. This is untested code and is just to see if the plan pans out. Maybe we should move FieldCache and FieldCacheImpl to the cache package inside the search package. Also maybe ToplevelFieldCache is a better name for FieldCacheImpl.
          Hide
          Uwe Schindler added a comment -

          Tiny, nice! I most like the readerFinishedListeners that purge the cache!

          I would suggest to add a new interface/abstract class for the per-Reader FieldCaches, as we currently have the stupid extra IndexReader param in the interface. The current FieldCacheImpl/FieldCache interface in the search package should be removed. The new one could have same name in the index package.

          I would move the current FieldCacheImpl code to the SlowMultiReaderWrapper (with the new interface) and leave the deprecated FieldCache/FieldCacheImpl in search with just delegating like this, but no actual impls:

          int[] FieldCacheImpl#getInts(IndexReader reader, String field) throws IOException {
            return new SlowMultiReaderWrapper(reader).getFieldCache().getInts(field);
          }
          
          Show
          Uwe Schindler added a comment - Tiny, nice! I most like the readerFinishedListeners that purge the cache! I would suggest to add a new interface/abstract class for the per-Reader FieldCaches, as we currently have the stupid extra IndexReader param in the interface. The current FieldCacheImpl/FieldCache interface in the search package should be removed. The new one could have same name in the index package. I would move the current FieldCacheImpl code to the SlowMultiReaderWrapper (with the new interface) and leave the deprecated FieldCache/FieldCacheImpl in search with just delegating like this, but no actual impls: int [] FieldCacheImpl#getInts(IndexReader reader, String field) throws IOException { return new SlowMultiReaderWrapper(reader).getFieldCache().getInts(field); }
          Hide
          Hoss Man added a comment -

          See also...

          • LUCENE-831 - the first issue (i know of) to discuss flipping the relationship between FieldCache and IndexReader (but got hung up on other things, particularly how reopen would affect this)
          • LUCENE-1785 - where thought was put into how to merge FieldCaches when Segments are merged
          • LUCENE-2665 - where a lot of work was already done to think about the "right" FieldCache API
          Show
          Hoss Man added a comment - See also... LUCENE-831 - the first issue (i know of) to discuss flipping the relationship between FieldCache and IndexReader (but got hung up on other things, particularly how reopen would affect this) LUCENE-1785 - where thought was put into how to merge FieldCaches when Segments are merged LUCENE-2665 - where a lot of work was already done to think about the "right" FieldCache API
          Hide
          Shai Erera added a comment -

          Also this discussion http://search.lucidimagination.com/search/document/d158b489c71f31d1/indexreader_cache_a_different_angle about a general cache hooks as part of IndexReader, where FieldCache is just one type of Cache.

          I think that the distance between putting FieldCache in IndexReader to allow plugging-in any Cache is very small, and does not require much more efforts.

          For some reason though, I have a feeling that on all the issues / discussions, we don't seem to get to a resolution about how best to do it. Is it just a matter of putting up a patch with a proposed impl and proceed from there?

          Show
          Shai Erera added a comment - Also this discussion http://search.lucidimagination.com/search/document/d158b489c71f31d1/indexreader_cache_a_different_angle about a general cache hooks as part of IndexReader, where FieldCache is just one type of Cache. I think that the distance between putting FieldCache in IndexReader to allow plugging-in any Cache is very small, and does not require much more efforts. For some reason though, I have a feeling that on all the issues / discussions, we don't seem to get to a resolution about how best to do it. Is it just a matter of putting up a patch with a proposed impl and proceed from there?
          Hide
          Martijn van Groningen added a comment -

          Thanks Hoss - Lots of issues related to the FieldCache! I got some reading to do

          I think that the distance between putting FieldCache in IndexReader to allow plugging-in any Cache is very small, and does not require much

          more efforts.
          I like the idea of a general cache. However I think we should do very small steps at the time. All these issues / proposals include a lot of changes and I think we have a bigger chance to commit something if we take small steps. So we could add the cache infrastructure to the IndexReader in a separate issue after this issue has been resolved.

          (but got hung up on other things, particularly how reopen would affect this)

          In this draft patch in case of a reopen / clone the new SegmentIndexReader just gets the FieldCache instance from the previous SegmentIndexReader.

          Show
          Martijn van Groningen added a comment - Thanks Hoss - Lots of issues related to the FieldCache! I got some reading to do I think that the distance between putting FieldCache in IndexReader to allow plugging-in any Cache is very small, and does not require much more efforts. I like the idea of a general cache. However I think we should do very small steps at the time. All these issues / proposals include a lot of changes and I think we have a bigger chance to commit something if we take small steps. So we could add the cache infrastructure to the IndexReader in a separate issue after this issue has been resolved. (but got hung up on other things, particularly how reopen would affect this) In this draft patch in case of a reopen / clone the new SegmentIndexReader just gets the FieldCache instance from the previous SegmentIndexReader.
          Hide
          Shai Erera added a comment -

          I agree that small steps are better, and that the general Cache infrastructure does not depend nor block this issue. Was just pointing it out as another reference for the IR -> Caching issues.

          Show
          Shai Erera added a comment - I agree that small steps are better, and that the general Cache infrastructure does not depend nor block this issue. Was just pointing it out as another reference for the IR -> Caching issues.
          Hide
          Michael McCandless added a comment -

          I agree we should proceed with amoeba-steps here: FieldCache has been a tar pit in the past!!

          Show
          Michael McCandless added a comment - I agree we should proceed with amoeba-steps here: FieldCache has been a tar pit in the past!!
          Hide
          Martijn van Groningen added a comment -

          Updated the patch with Uwe's suggestions. There is still a lot to be done.

          Just a few things:

          • FieldCache contains a number of public inner classes. Since the FieldCache is now deprecated I'm not sure where to put these classes.
          • I named the per segment FieldCache interface AtomicFieldCache for now.
          • I think that lucene.search.cache is a better place for this AtomicFieldCache and its implementation than lucene.index package.
          Show
          Martijn van Groningen added a comment - Updated the patch with Uwe's suggestions. There is still a lot to be done. Just a few things: FieldCache contains a number of public inner classes. Since the FieldCache is now deprecated I'm not sure where to put these classes. I named the per segment FieldCache interface AtomicFieldCache for now. I think that lucene.search.cache is a better place for this AtomicFieldCache and its implementation than lucene.index package.
          Hide
          Shai Erera added a comment -

          I named the per segment FieldCache interface AtomicFieldCache for now.

          Perhaps SegmentFieldCache?

          I think that lucene.search.cache is a better place for this AtomicFieldCache and its implementation than lucene.index package.

          I agree. FieldCache is used mostly for search purposes.

          Show
          Shai Erera added a comment - I named the per segment FieldCache interface AtomicFieldCache for now. Perhaps SegmentFieldCache? I think that lucene.search.cache is a better place for this AtomicFieldCache and its implementation than lucene.index package. I agree. FieldCache is used mostly for search purposes.
          Hide
          Martijn van Groningen added a comment -

          SegmentFieldCache seems like a better name.

          Show
          Martijn van Groningen added a comment - SegmentFieldCache seems like a better name.
          Hide
          Martijn van Groningen added a comment -

          Attached an updated patch.

          • Renamed AtomicFieldCache to SegmentFieldCache
          • Updated to the latest changes in trunk. (FieldCache was changed)
          Show
          Martijn van Groningen added a comment - Attached an updated patch. Renamed AtomicFieldCache to SegmentFieldCache Updated to the latest changes in trunk. (FieldCache was changed)
          Hide
          Michael McCandless added a comment -

          I actually prefer the name AtomicFieldCache, since this matches other
          places (eg. AtomicReaderContext), and because it's not necessarily a
          segment (SlowMultiReaderWrapper returns an instance).

          The name SegmentFieldCacheImpl seems OK, but can't this class be
          package private?

          I love the name InsaneFieldCache!

          For the new IR.getFieldCache() instead of a "generic" UOE can we throw
          something like MR.fields() throws? Ie the exc message should explain
          that you should use the SlowMRWrapper instead.

          I'm nervous about how the [deprecated] FC makes a new SlowMRWrapper()
          for each getXXX call – I think this class uses "this" as the
          getFieldCacheKey? Won't this mean each lookup will build a new cache
          entry? (Hmm... but then why don't tests fail... I think we have at
          least one test verifying same instance is returned for 2 calls in a
          row).

          Show
          Michael McCandless added a comment - I actually prefer the name AtomicFieldCache, since this matches other places (eg. AtomicReaderContext), and because it's not necessarily a segment (SlowMultiReaderWrapper returns an instance). The name SegmentFieldCacheImpl seems OK, but can't this class be package private? I love the name InsaneFieldCache! For the new IR.getFieldCache() instead of a "generic" UOE can we throw something like MR.fields() throws? Ie the exc message should explain that you should use the SlowMRWrapper instead. I'm nervous about how the [deprecated] FC makes a new SlowMRWrapper() for each getXXX call – I think this class uses "this" as the getFieldCacheKey? Won't this mean each lookup will build a new cache entry? (Hmm... but then why don't tests fail... I think we have at least one test verifying same instance is returned for 2 calls in a row).
          Hide
          Robert Muir added a comment -

          For the new IR.getFieldCache() instead of a "generic" UOE can we throw
          something like MR.fields() throws? Ie the exc message should explain
          that you should use the SlowMRWrapper instead.

          Yeah, can we force the user to make their own SlowMultiReaderWrapper? I don't think we should ever create such things, even for deprecated stuff in 4.0

          In 4.0 we just change the API and make you create it yourself.

          Show
          Robert Muir added a comment - For the new IR.getFieldCache() instead of a "generic" UOE can we throw something like MR.fields() throws? Ie the exc message should explain that you should use the SlowMRWrapper instead. Yeah, can we force the user to make their own SlowMultiReaderWrapper? I don't think we should ever create such things, even for deprecated stuff in 4.0 In 4.0 we just change the API and make you create it yourself.
          Hide
          Michael McCandless added a comment -

          In 4.0 we just change the API and make you create it yourself.

          Show
          Michael McCandless added a comment - In 4.0 we just change the API and make you create it yourself.
          Hide
          Michael McCandless added a comment -

          In 4.0 we just change the API and make you create it yourself.

          +1

          Show
          Michael McCandless added a comment - In 4.0 we just change the API and make you create it yourself. +1
          Hide
          Martijn van Groningen added a comment -

          I actually prefer the name AtomicFieldCache, since this matches other
          places (eg. AtomicReaderContext), and because it's not necessarily a
          segment (SlowMultiReaderWrapper returns an instance).

          For naming consistency I agree with you there. I will make the SegmentFieldCacheImpl package private. I kept it in the patch it public, b/c FieldCacheImpl was public too.

          I'm nervous about how the [deprecated] FC makes a new SlowMRWrapper()
          for each getXXX call – I think this class uses "this" as the
          getFieldCacheKey? Won't this mean each lookup will build a new cache
          entry? (Hmm... but then why don't tests fail... I think we have at
          least one test verifying same instance is returned for 2 calls in a
          row).

          The current FielCache api requires a reader to be specified. That specified reader is used as key.

          I love the name InsaneFieldCache!

          I love the name too

          In 4.0 we just change the API and make you create it yourself.

          So remove [deprecated] FieldCache completely? When back porting to 3x we obviously need to keep FieldCache classes...

          Show
          Martijn van Groningen added a comment - I actually prefer the name AtomicFieldCache, since this matches other places (eg. AtomicReaderContext), and because it's not necessarily a segment (SlowMultiReaderWrapper returns an instance). For naming consistency I agree with you there. I will make the SegmentFieldCacheImpl package private. I kept it in the patch it public, b/c FieldCacheImpl was public too. I'm nervous about how the [deprecated] FC makes a new SlowMRWrapper() for each getXXX call – I think this class uses "this" as the getFieldCacheKey? Won't this mean each lookup will build a new cache entry? (Hmm... but then why don't tests fail... I think we have at least one test verifying same instance is returned for 2 calls in a row). The current FielCache api requires a reader to be specified. That specified reader is used as key. I love the name InsaneFieldCache! I love the name too In 4.0 we just change the API and make you create it yourself. So remove [deprecated] FieldCache completely? When back porting to 3x we obviously need to keep FieldCache classes...
          Hide
          Michael McCandless added a comment -

          The current FielCache api requires a reader to be specified. That specified reader is used as key.

          Ahh I see – the SlowMRWrapper's FC impl uses the "in" (wrapped IndexReader, from FilterIndexReader) as the key. So we should only see one cache entry created. Phew!

          When back porting to 3x we obviously need to keep FieldCache classes...

          True... so maybe do this patch in 2 steps? First, for 3.x. Then, merge to trunk, remove deprecated FC and fix all usages?

          Show
          Michael McCandless added a comment - The current FielCache api requires a reader to be specified. That specified reader is used as key. Ahh I see – the SlowMRWrapper's FC impl uses the "in" (wrapped IndexReader, from FilterIndexReader) as the key. So we should only see one cache entry created. Phew! When back porting to 3x we obviously need to keep FieldCache classes... True... so maybe do this patch in 2 steps? First, for 3.x. Then, merge to trunk, remove deprecated FC and fix all usages?
          Hide
          Uwe Schindler added a comment -

          Thanks, for taking care! I am a little bit over-crowded with summer-holidays

          Show
          Uwe Schindler added a comment - Thanks, for taking care! I am a little bit over-crowded with summer-holidays
          Hide
          Martijn van Groningen added a comment -

          True... so maybe do this patch in 2 steps? First, for 3.x. Then, merge to trunk, remove deprecated FC and fix all usages?

          Proposal: Get everything to work with the current deprecated FieldCache. Port to working code to 3x. Remove deprecated FieldCache from trunk and move its inner classes (like DocTermsIndex) to search.cache.

          FieldCache has also some static *_PARSER fields that are used in the *ValuesCreator classes. I think these fields should move to *ValuesCreator classes as private fields. Besides the tests these fields are only used in *ValuesCreator classes.

          Show
          Martijn van Groningen added a comment - True... so maybe do this patch in 2 steps? First, for 3.x. Then, merge to trunk, remove deprecated FC and fix all usages? Proposal: Get everything to work with the current deprecated FieldCache. Port to working code to 3x. Remove deprecated FieldCache from trunk and move its inner classes (like DocTermsIndex) to search.cache. FieldCache has also some static *_PARSER fields that are used in the *ValuesCreator classes. I think these fields should move to *ValuesCreator classes as private fields. Besides the tests these fields are only used in *ValuesCreator classes.
          Hide
          Uwe Schindler added a comment -

          FieldCache has also some static *_PARSER fields that are used in the *ValuesCreator classes. I think these fields should move to *ValuesCreator classes as private fields. Besides the tests these fields are only used in *ValuesCreator classes.

          They should only be public available... If ValueCreate classes are package private then its not good. The parsers are oftenh used when the field type is know before. E.g. to circumvent problems where the autodetection of old-style toString() numeric fields vs. NumericField is needed. So I always specify the NF parser explicitely on FieldCache.getLong(IR, Parser). Or is this no longer the case?

          Show
          Uwe Schindler added a comment - FieldCache has also some static *_PARSER fields that are used in the *ValuesCreator classes. I think these fields should move to *ValuesCreator classes as private fields. Besides the tests these fields are only used in *ValuesCreator classes. They should only be public available... If ValueCreate classes are package private then its not good. The parsers are oftenh used when the field type is know before. E.g. to circumvent problems where the autodetection of old-style toString() numeric fields vs. NumericField is needed. So I always specify the NF parser explicitely on FieldCache.getLong(IR, Parser). Or is this no longer the case?
          Hide
          Martijn van Groningen added a comment -

          E.g. to circumvent problems where the autodetection of old-style toString() numeric fields vs. NumericField is needed.

          You mean this problem in LongValuesCreator#fillLongValues:

            if( parser == null ) {
                try {
                  parser = FieldCache.DEFAULT_LONG_PARSER;
                  fillLongValues( vals, reader, field );
                  return;
                }
                catch (NumberFormatException ne) {
                  vals.parserHashCode = null; // wipe the previous one
                  parser = FieldCache.NUMERIC_UTILS_LONG_PARSER;
                  fillLongValues( vals, reader, field );
                  return;
                }
              }
          

          Anyway we can make these *_PARSER fields publicly available. Would the search.cache.parsers package be a good location?

          Show
          Martijn van Groningen added a comment - E.g. to circumvent problems where the autodetection of old-style toString() numeric fields vs. NumericField is needed. You mean this problem in LongValuesCreator#fillLongValues: if ( parser == null ) { try { parser = FieldCache.DEFAULT_LONG_PARSER; fillLongValues( vals, reader, field ); return ; } catch (NumberFormatException ne) { vals.parserHashCode = null ; // wipe the previous one parser = FieldCache.NUMERIC_UTILS_LONG_PARSER; fillLongValues( vals, reader, field ); return ; } } Anyway we can make these *_PARSER fields publicly available. Would the search.cache.parsers package be a good location?
          Hide
          Martijn van Groningen added a comment - - edited

          Added a new version of patch.

          • Renamed SegmentFieldCache to AtomicFieldCache.
          • SegementFieldCacheImpl is now package protected. This class has been moved to lucene.index package.
          • Added TestSegmentFieldCacheImpl test.
          Show
          Martijn van Groningen added a comment - - edited Added a new version of patch. Renamed SegmentFieldCache to AtomicFieldCache. SegementFieldCacheImpl is now package protected. This class has been moved to lucene.index package. Added TestSegmentFieldCacheImpl test.
          Hide
          Uwe Schindler added a comment -

          You mean this problem in LongValuesCreator#fillLongValues:

          Yes, same for the other types!

          Show
          Uwe Schindler added a comment - You mean this problem in LongValuesCreator#fillLongValues: Yes, same for the other types!
          Hide
          Martijn van Groningen added a comment -

          Updated patch to latest changes in trunk.
          I'm busy with the 3x backport, but since the FieldCacheImpl versions between trunk and 3x it took me longer than I expected.

          I think once the 3x version is ready (coming days or so) we can commit this patch also.
          After that we can work towards removing the current FC from trunk and moving the classes inside FieldCache class to proper locations (like search.cache package).

          Show
          Martijn van Groningen added a comment - Updated patch to latest changes in trunk. I'm busy with the 3x backport, but since the FieldCacheImpl versions between trunk and 3x it took me longer than I expected. I think once the 3x version is ready (coming days or so) we can commit this patch also. After that we can work towards removing the current FC from trunk and moving the classes inside FieldCache class to proper locations (like search.cache package).
          Hide
          Martijn van Groningen added a comment -

          Attached patch for 3x branch.
          The major difference between 3x and trunk:

          • DocTerms is String[] and DocTermsIndex is StringIndex in 3x branch.
          • In 3x SlowMultiReaderWrapper is only available for tests. I moved it to core. In 3x SlowMultiReaderWrapper extends MultiReader whereas in trunk SlowMultiReaderWrapper extends FilterIndexReader.
          Show
          Martijn van Groningen added a comment - Attached patch for 3x branch. The major difference between 3x and trunk: DocTerms is String[] and DocTermsIndex is StringIndex in 3x branch. In 3x SlowMultiReaderWrapper is only available for tests. I moved it to core. In 3x SlowMultiReaderWrapper extends MultiReader whereas in trunk SlowMultiReaderWrapper extends FilterIndexReader.
          Hide
          Martijn van Groningen added a comment -

          Small update. During testing SOLR-2066 I noticed that unnecessary ReaderFinishedListener instances were created in SlowMultiReaderWrapper (I had a OOME). This is fixed now.

          Show
          Martijn van Groningen added a comment - Small update. During testing SOLR-2066 I noticed that unnecessary ReaderFinishedListener instances were created in SlowMultiReaderWrapper (I had a OOME). This is fixed now.
          Hide
          Martijn van Groningen added a comment - - edited

          Attached an updated version for trunk.

          • FieldCache and FieldCacheImpl have been removed completely from the source code.
          • DocTerms and DocTermsIndex have been moved to o.a.l.search.cache
          • Parsers have been moved to o.a.l.search.cache.parser
          • Functionality that relies on top level caches (e.g. faceting and some functions) now uses SlowMultiReaderWrapper:
            new SlowMultiReaderWrapper(topReader).getFieldCache()
          Show
          Martijn van Groningen added a comment - - edited Attached an updated version for trunk. FieldCache and FieldCacheImpl have been removed completely from the source code. DocTerms and DocTermsIndex have been moved to o.a.l.search.cache Parsers have been moved to o.a.l.search.cache.parser Functionality that relies on top level caches (e.g. faceting and some functions) now uses SlowMultiReaderWrapper: new SlowMultiReaderWrapper(topReader).getFieldCache()
          Hide
          Uwe Schindler added a comment -

          Hi Martijn,

          nice work! But I would recommend to wait with commiting those changes. We redo FieldCache for 3.x and wanted to forward-port that to trunk, as the FieldCache implementation is very complicated there: LUCENE-3443

          I think we should work together and solve the stuff identical like in 3.x, but bound to IndexReader.

          Show
          Uwe Schindler added a comment - Hi Martijn, nice work! But I would recommend to wait with commiting those changes. We redo FieldCache for 3.x and wanted to forward-port that to trunk, as the FieldCache implementation is very complicated there: LUCENE-3443 I think we should work together and solve the stuff identical like in 3.x, but bound to IndexReader.
          Hide
          Martijn van Groningen added a comment - - edited

          Makes sense to wait to commit this change for trunk. I've updated the 3x version of the patch with latest changes in 3x. This version only deprecates FC.

          Show
          Martijn van Groningen added a comment - - edited Makes sense to wait to commit this change for trunk. I've updated the 3x version of the patch with latest changes in 3x. This version only deprecates FC.
          Hide
          Martijn van Groningen added a comment -

          Updated patch for 3x. All tests pass on my machine. Most references to FieldCache.DEFAULT are removed and replaced by IR.getFieldCache() or new SlowMultiReader(r).getFieldCache

          Show
          Martijn van Groningen added a comment - Updated patch for 3x. All tests pass on my machine. Most references to FieldCache.DEFAULT are removed and replaced by IR.getFieldCache() or new SlowMultiReader(r).getFieldCache
          Hide
          Michael McCandless added a comment -

          Looks great Martijn!

          On waiting for trunk, why not land this first and then do LUCENE-3443 after? Or is there a patch for LUCENE-3443? Seems like we shouldn't hold off committing this to trunk just because LUCENE-3443 is pending unless we actually have a patch for it...

          Show
          Michael McCandless added a comment - Looks great Martijn! On waiting for trunk, why not land this first and then do LUCENE-3443 after? Or is there a patch for LUCENE-3443 ? Seems like we shouldn't hold off committing this to trunk just because LUCENE-3443 is pending unless we actually have a patch for it...
          Hide
          Martijn van Groningen added a comment -

          I don't see a patch for LUCENE-3443 yet. Maybe we can forward port this issue including FieldCache.getDocsWithField() to trunk?

          Show
          Martijn van Groningen added a comment - I don't see a patch for LUCENE-3443 yet. Maybe we can forward port this issue including FieldCache.getDocsWithField() to trunk ?
          Hide
          Uwe Schindler added a comment -

          Maybe we can forward port this issue including FieldCache.getDocsWithField() to trunk?

          That was the plan. But first we need to revert the current impl in trunk

          Show
          Uwe Schindler added a comment - Maybe we can forward port this issue including FieldCache.getDocsWithField() to trunk? That was the plan. But first we need to revert the current impl in trunk
          Hide
          Martijn van Groningen added a comment -

          That was the plan. But first we need to revert the current impl in trunk

          I think I missed that. Can you point to the code and svn revision in trunk?

          Show
          Martijn van Groningen added a comment - That was the plan. But first we need to revert the current impl in trunk I think I missed that. Can you point to the code and svn revision in trunk?
          Hide
          Michael McCandless added a comment -

          I think this can proceed, now that LUCENE-3443 is done!

          Show
          Michael McCandless added a comment - I think this can proceed, now that LUCENE-3443 is done!
          Hide
          Ryan McKinley added a comment -

          wow, lots has changed since the last patch! I think all the reader/cache refactoring has settled down, so this would be good to get sorted soon.

          I ran it through the LUCENE-3753.patch.hack.pl but there are enough changes, that it is significant work to resurrect from the patch; it may be more effective to refactor the code.

          Martijn, you interested in picking this up again? If no, i may give it a try.

          Show
          Ryan McKinley added a comment - wow, lots has changed since the last patch! I think all the reader/cache refactoring has settled down, so this would be good to get sorted soon. I ran it through the LUCENE-3753 .patch.hack.pl but there are enough changes, that it is significant work to resurrect from the patch; it may be more effective to refactor the code. Martijn, you interested in picking this up again? If no, i may give it a try.
          Hide
          Martijn van Groningen added a comment -

          Hi Ryan! Yeah, a lot has changed since the last patch. This issue is on my task list, however so are other issues. I was planning to pick this in the coming weeks. If you have time now it would be great to continue the work on this issue now. To sooner we get rid of the FieldCache#DEFAULT the better!

          Show
          Martijn van Groningen added a comment - Hi Ryan! Yeah, a lot has changed since the last patch. This issue is on my task list, however so are other issues. I was planning to pick this in the coming weeks. If you have time now it would be great to continue the work on this issue now. To sooner we get rid of the FieldCache#DEFAULT the better!
          Hide
          Robert Muir added a comment -

          FieldCache only takes AtomicReader now... so the insanity trap is already removed in trunk.

          We have alternative ways to 'fieldcache' at index-time, by using SortedBytes docvalues field.
          I was originally skeptical of SortedBytes (especially given its initial impl problems), but its
          been cleaned up a lot recently, I think its the direction should really move forwards to.

          There is also progress on more efficient implementations by doing more at indexing time: e.g.
          LUCENE-3729

          As an "index" the idea is to compute things up-front so that searches are faster: I don't
          think lucene core needs to support 'uninverting at runtime' ?!

          So I think we should start a plan for how FieldCache can be moved to contrib or deprecated instead.

          Show
          Robert Muir added a comment - FieldCache only takes AtomicReader now... so the insanity trap is already removed in trunk. We have alternative ways to 'fieldcache' at index-time, by using SortedBytes docvalues field. I was originally skeptical of SortedBytes (especially given its initial impl problems), but its been cleaned up a lot recently, I think its the direction should really move forwards to. There is also progress on more efficient implementations by doing more at indexing time: e.g. LUCENE-3729 As an "index" the idea is to compute things up-front so that searches are faster: I don't think lucene core needs to support 'uninverting at runtime' ?! So I think we should start a plan for how FieldCache can be moved to contrib or deprecated instead.
          Hide
          Uwe Schindler added a comment - - edited

          Hi,
          in my opinion, the whole FieldCache is obsolete somehow, since we have DocValues in trunk. If we still want some "on-the-fly uninverting", we can simply expose that as another "DocValues" impl. Why does FieldCache needs to be separate? DocValues is the API and FieldCache can simply be another impl of that, not materialized on disk.

          One cool thing about a separate "FieldCache DocValues" impl would be that you could merge... Means: A 3.x index without DocValues could on the next merge be a 4.x segment with DocValues that were merged on the fly from the 3.x index by uninverting For trunk indexes we require docvalues to facet, group, sort,... The 3.x codec would simply simulate that by doing uninversion. Once those old 3.x segments are merged to new 4.x segments, docvalues materialize from the "on-they-fly 3.x fieldcache docvalues".

          Show
          Uwe Schindler added a comment - - edited Hi, in my opinion, the whole FieldCache is obsolete somehow, since we have DocValues in trunk. If we still want some "on-the-fly uninverting", we can simply expose that as another "DocValues" impl. Why does FieldCache needs to be separate? DocValues is the API and FieldCache can simply be another impl of that, not materialized on disk. One cool thing about a separate "FieldCache DocValues" impl would be that you could merge... Means: A 3.x index without DocValues could on the next merge be a 4.x segment with DocValues that were merged on the fly from the 3.x index by uninverting For trunk indexes we require docvalues to facet, group, sort,... The 3.x codec would simply simulate that by doing uninversion. Once those old 3.x segments are merged to new 4.x segments, docvalues materialize from the "on-they-fly 3.x fieldcache docvalues".
          Hide
          Bill Bell added a comment -

          +1 for FieldCache being an impl of DocValues.

          Show
          Bill Bell added a comment - +1 for FieldCache being an impl of DocValues.
          Hide
          Martijn van Groningen added a comment -

          Closing this issue since it would only couple the FieldCache and IndexReader instead to decouple this for the new direction that is being explored in LUCENE-3729.

          Show
          Martijn van Groningen added a comment - Closing this issue since it would only couple the FieldCache and IndexReader instead to decouple this for the new direction that is being explored in LUCENE-3729 .

            People

            • Assignee:
              Unassigned
              Reporter:
              Martijn van Groningen
            • Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development