Solr
  1. Solr
  2. SOLR-4589

4.x + enableLazyFieldLoading + large multivalued fields + varying fl = pathological CPU load & response time

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0, 4.1, 4.2
    • Fix Version/s: 4.2.1, 4.3, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      Following up on a user report of exterme CPU usage in 4.1, I've discovered that the following combination of factors can result in extreme CPU usage and excessively HTTP response times...

      • Solr 4.x (tested 3.6.1, 4.0.0, and 4.2.0)
      • enableLazyFieldLoading == true (included in example solrconfig.xml)
      • documents with a large number of values in multivalued fields (eg: tested ~10-15K values)
      • multiple requests returning the same doc with different "fl" lists

      I haven't dug into the route cause yet, but the essential observations is: if lazyloading is used in 4.x, then once a document has been fetched with an initial fl list X, subsequent requests for that document using a differnet fl list Y can be many orders of magnitute slower (while pegging the CPU) – even if those same requests using fl Y uncached (or w/o lazy laoding) would be extremely fast.

      1. SOLR-4589.patch
        20 kB
        Hoss Man
      2. SOLR-4589.patch
        21 kB
        Hoss Man
      3. SOLR-4589.patch
        21 kB
        Hoss Man
      4. SOLR-4589.patch
        18 kB
        Hoss Man
      5. SOLR-4589.patch
        18 kB
        Hoss Man
      6. SOLR-4589.patch
        9 kB
        Hoss Man
      7. test.out__3.6.1_mmap_lazy.txt
        0.6 kB
        Hoss Man
      8. test.out__3.6.1_mmap_nolazy.txt
        0.4 kB
        Hoss Man
      9. test.out__3.6.1_nio_lazy.txt
        0.6 kB
        Hoss Man
      10. test.out__3.6.1_nio_nolazy.txt
        0.6 kB
        Hoss Man
      11. test.out__4.0.0_mmap_lazy.txt
        0.4 kB
        Hoss Man
      12. test.out__4.0.0_mmap_nolazy.txt
        0.4 kB
        Hoss Man
      13. test.out__4.0.0_nio_lazy.txt
        0.4 kB
        Hoss Man
      14. test.out__4.0.0_nio_nolazy.txt
        0.4 kB
        Hoss Man
      15. test.out__4.2.0_mmap_lazy.txt
        0.4 kB
        Hoss Man
      16. test.out__4.2.0_mmap_nolazy.txt
        0.4 kB
        Hoss Man
      17. test.out__4.2.0_nio_lazy.txt
        0.4 kB
        Hoss Man
      18. test.out__4.2.0_nio_nolazy.txt
        0.4 kB
        Hoss Man
      19. test.sh
        2 kB
        Hoss Man
      20. test-just-queries.out__4.0.0_mmap_lazy_using36index.txt
        0.2 kB
        Hoss Man
      21. test-just-queries.sh
        1 kB
        Hoss Man

        Activity

        Hide
        Hoss Man added a comment -

        The attached files include a test.sh script that:

        • creates some data where fields have a large number of values
        • loads the data into solr
        • execs 2 queries for a single doc using two different fl options
        • triggers a commit to flush caches
        • execs the same two queries in a differnet order

        Also attached are the raw results of running this script on my Thinkpad T430s against the example jetty & solr configs where the version of solr, lazyfield loading, and the directory impl were varried...

        • version of solr
          • 3.6.1
          • 4.0.0
          • 4.2.0
        • lazy field loading:
          • lazy: default example configs
          • nolazy: perl -i -pe 's {<enableLazyFieldLoading>true} {<enableLazyFieldLoading>false}

            ' solrconfig.xml

        • directory impl:
          • mmap: java -Dsolr.directoryFactory=solr.MMapDirectoryFactory -jar start.jar
          • nio: java -Dsolr.directoryFactory=solr.NIOFSDirectoryFactory -jar start.jar

        There was no apparent difference in the directory impl choosen, or between 4.0 and 4.2. Here's the summary results for 3.6 vs 4.0 using mmap...

        step 3.6 nolazy 3.6 lazy 4.0 nolazy 4.0 lazy
        small fl 0m0.308s 0m0.998s 0m0.260s 0m0.202s
        big fl 0m0.178s 0m0.263s 0m0.084s 16m15.735s
        commit XXXXXXX XXXXXXX XXXXXXX XXXXXXX
        big fl 0m0.157s 0m0.118s 0m0.218s 0m0.133s
        small fl 0m0.036s 0m0.035s 0m0.049s 3m2.814s

        Also attached is also the results of a single test I did running Solr 4.0 pointed at the configs & index built with 3.6.1 to rule out codec changes: it behaved essentially the same as the 4.0 tests that built the index from scratch.

        Show
        Hoss Man added a comment - The attached files include a test.sh script that: creates some data where fields have a large number of values loads the data into solr execs 2 queries for a single doc using two different fl options triggers a commit to flush caches execs the same two queries in a differnet order Also attached are the raw results of running this script on my Thinkpad T430s against the example jetty & solr configs where the version of solr, lazyfield loading, and the directory impl were varried... version of solr 3.6.1 4.0.0 4.2.0 lazy field loading: lazy: default example configs nolazy: perl -i -pe 's {<enableLazyFieldLoading>true} {<enableLazyFieldLoading>false} ' solrconfig.xml directory impl: mmap: java -Dsolr.directoryFactory=solr.MMapDirectoryFactory -jar start.jar nio: java -Dsolr.directoryFactory=solr.NIOFSDirectoryFactory -jar start.jar There was no apparent difference in the directory impl choosen, or between 4.0 and 4.2. Here's the summary results for 3.6 vs 4.0 using mmap... step 3.6 nolazy 3.6 lazy 4.0 nolazy 4.0 lazy small fl 0m0.308s 0m0.998s 0m0.260s 0m0.202s big fl 0m0.178s 0m0.263s 0m0.084s 16m15.735s commit XXXXXXX XXXXXXX XXXXXXX XXXXXXX big fl 0m0.157s 0m0.118s 0m0.218s 0m0.133s small fl 0m0.036s 0m0.035s 0m0.049s 3m2.814s Also attached is also the results of a single test I did running Solr 4.0 pointed at the configs & index built with 3.6.1 to rule out codec changes: it behaved essentially the same as the 4.0 tests that built the index from scratch.
        Hide
        Yonik Seeley added a comment -

        I wonder if this could be related to index compression (and maybe the same block being repeatedly decompressed for each lazy field being accessed?)

        Show
        Yonik Seeley added a comment - I wonder if this could be related to index compression (and maybe the same block being repeatedly decompressed for each lazy field being accessed?)
        Hide
        Uwe Schindler added a comment -

        I wonder if this could be related to index compression (and maybe the same block being repeatedly decompressed for each lazy field being accessed?)

        This also happens in Solr 4.0, which had no compression.

        The reason here might be the changes in stored fields altogether. Lucene natively no longer has support for lazy field loading, but there is a "backwards layer" just for Solr in modules/misc (LazyDocument.java). The document does not use maps to lookup, if you have many fields its always a scan through the ArrayList of all fields in the document. The lazyness in LazyDocument is only that the whole document is loaded delayed, but no longer single fields.

        Show
        Uwe Schindler added a comment - I wonder if this could be related to index compression (and maybe the same block being repeatedly decompressed for each lazy field being accessed?) This also happens in Solr 4.0, which had no compression. The reason here might be the changes in stored fields altogether. Lucene natively no longer has support for lazy field loading, but there is a "backwards layer" just for Solr in modules/misc (LazyDocument.java). The document does not use maps to lookup, if you have many fields its always a scan through the ArrayList of all fields in the document. The lazyness in LazyDocument is only that the whole document is loaded delayed, but no longer single fields.
        Hide
        Hoss Man added a comment -

        Lucene natively no longer has support for lazy field loading, but there is a "backwards layer" just for Solr in modules/misc (LazyDocument.java)

        Yeah .. LazyDocument is seriously evil...

        The document does not use maps to lookup, if you have many fields its always a scan through the ArrayList of all fields in the document.

        It's worse then that though – having many fields and scanning through them all to fetch a single field value (or an array of field values for a single name) is a cost that has to be paid in 4.x regardless of whether you are using LazyDocument or not, the root problem here seems to be having fields that contain many values. That problem is exacerbated by the fact that unlike 3.x lazy loading, 4.x LaxyDocument/LazyField doesn't do anything to "cache" the fields you've already asked for.

        Below are my notes from investigating this and trying to get up to speed on the new world order of document loading w/o FieldSelector. I'll experiment with some fixes after i get some food...


        LUCENE-2308 - r1162347

        • IndexReader.doc(int,FieldSelector) deleted
        • FieldSelector moved to misc
        • new concept StoredFieldVisitor introduced
          • void IndexReader.document(int docID, StoredFieldVisitor visitor)
        • new impl DocumentStoredFieldVisitor extends StoredFieldVisitor
        • new impl FieldSelectorVisitor extends StoredFieldVisitor
          • appears all the old FieldSelector logic from IndexReader moved here?
          • contains a private "LazyField extends Field" that caches field values once fetched
        • SolrIndexSearcher modified to use FieldSelectorVisitor

        LUCENE-2621 - r1199779

        • eliminates FieldSelector & FieldSelectorVisitor
        • leaves StoredFieldVisitor & DocumentStoredFieldVisitor intact
        • introduced public LazyDocument containing "LazyField implements IndexableField"
          • this version of LazyField does not cache any data once fetched
        • changes SolrIndexSearcher's SetNonLazyFieldSelector to extend StoredFieldVisitor
          • add's LazyField to the Document for any fields not immediately needed

        The crux of the problem is that:

        • LazyDocument is lazy about loading the doc, but once you ask for the value any LazyField, the entire Document (with all underlying IndexableField values) is loaded.
        • even though the entire document has been loaded once a single LazyField is used, the performance of iterating over LazyField's is TERRIBLE when there are lots of values for a single field
        • requests for the value of individual LazyFields are not cached/stored anywhere, so the poor performace affects all subsequent re-uses of the same LazyDocuments

        Details...

        The state tracked in a LazyField is a refrence back to the underlying LazyDocument, the field name, and the "num" offset of this IndexableField in the list of values for that field name. When you ask the LazyField for it's value, it asks the underlying LazyDocument to fetch the entire Document (if it hasn't already) and then it asks that Document for all values of the assocaited field name as an arry, and then it looks up it's "num" offset in that array.

        So if you build up an (outer) Document containing N LazyField instances for field named "foo" (as is done in Solr's SetNonLazyFieldSelector), and then try to iterate over the values with something like String[] values = outerDoc.getValues("foo"); under the covers LazyField will load every value of every field of that document into memory as an "innerDoc", that innerDoc will be asked N times to generate a new IndexableField[] of every value of field "foo" (which BTW: involves iterating over every IndexableField value of every field) and N-1 elements of that array will then be ignored and thrown away.


        It's not clear to me why FieldSelectorVisitor was eliminated in LUCENE-2621 (no discussion in the comments on point) but it's also not clear to me why LazyDocument+LazyField would ever be a good idea in any application that had more then a handful of fields (and if you don't have very many fields, why are you lazy loading?).

        It's also not clear to me why the LazyDocument version of LazyField doesn't include the same caching logic as the version that was included in FieldSelectorVisitor (or the older lazy loading code in 3.6) because w/o that the usage pattern in Solr – in which Document objects are cached – results in the worst of all possible worlds: once a Document is cached with only a small subset of "real" fields, and the rest are "LazyField" instances, every subsequent request for that document that involves those LazyFields is slow, even if they ask for the same LazyField over and over.

        Show
        Hoss Man added a comment - Lucene natively no longer has support for lazy field loading, but there is a "backwards layer" just for Solr in modules/misc (LazyDocument.java) Yeah .. LazyDocument is seriously evil... The document does not use maps to lookup, if you have many fields its always a scan through the ArrayList of all fields in the document. It's worse then that though – having many fields and scanning through them all to fetch a single field value (or an array of field values for a single name) is a cost that has to be paid in 4.x regardless of whether you are using LazyDocument or not, the root problem here seems to be having fields that contain many values . That problem is exacerbated by the fact that unlike 3.x lazy loading, 4.x LaxyDocument/LazyField doesn't do anything to "cache" the fields you've already asked for. Below are my notes from investigating this and trying to get up to speed on the new world order of document loading w/o FieldSelector. I'll experiment with some fixes after i get some food... LUCENE-2308 - r1162347 IndexReader.doc(int,FieldSelector) deleted FieldSelector moved to misc new concept StoredFieldVisitor introduced void IndexReader.document(int docID, StoredFieldVisitor visitor) new impl DocumentStoredFieldVisitor extends StoredFieldVisitor new impl FieldSelectorVisitor extends StoredFieldVisitor appears all the old FieldSelector logic from IndexReader moved here? contains a private "LazyField extends Field" that caches field values once fetched SolrIndexSearcher modified to use FieldSelectorVisitor LUCENE-2621 - r1199779 eliminates FieldSelector & FieldSelectorVisitor leaves StoredFieldVisitor & DocumentStoredFieldVisitor intact introduced public LazyDocument containing "LazyField implements IndexableField" this version of LazyField does not cache any data once fetched changes SolrIndexSearcher's SetNonLazyFieldSelector to extend StoredFieldVisitor add's LazyField to the Document for any fields not immediately needed The crux of the problem is that: LazyDocument is lazy about loading the doc, but once you ask for the value any LazyField, the entire Document (with all underlying IndexableField values) is loaded. even though the entire document has been loaded once a single LazyField is used, the performance of iterating over LazyField's is TERRIBLE when there are lots of values for a single field requests for the value of individual LazyFields are not cached/stored anywhere, so the poor performace affects all subsequent re-uses of the same LazyDocuments Details... The state tracked in a LazyField is a refrence back to the underlying LazyDocument, the field name, and the "num" offset of this IndexableField in the list of values for that field name. When you ask the LazyField for it's value, it asks the underlying LazyDocument to fetch the entire Document (if it hasn't already) and then it asks that Document for all values of the assocaited field name as an arry, and then it looks up it's "num" offset in that array. So if you build up an (outer) Document containing N LazyField instances for field named "foo" (as is done in Solr's SetNonLazyFieldSelector), and then try to iterate over the values with something like String[] values = outerDoc.getValues("foo"); under the covers LazyField will load every value of every field of that document into memory as an "innerDoc", that innerDoc will be asked N times to generate a new IndexableField[] of every value of field "foo" (which BTW: involves iterating over every IndexableField value of every field) and N-1 elements of that array will then be ignored and thrown away. It's not clear to me why FieldSelectorVisitor was eliminated in LUCENE-2621 (no discussion in the comments on point) but it's also not clear to me why LazyDocument+LazyField would ever be a good idea in any application that had more then a handful of fields (and if you don't have very many fields, why are you lazy loading?). It's also not clear to me why the LazyDocument version of LazyField doesn't include the same caching logic as the version that was included in FieldSelectorVisitor (or the older lazy loading code in 3.6) because w/o that the usage pattern in Solr – in which Document objects are cached – results in the worst of all possible worlds: once a Document is cached with only a small subset of "real" fields, and the rest are "LazyField" instances, every subsequent request for that document that involves those LazyFields is slow, even if they ask for the same LazyField over and over.
        Hide
        Hoss Man added a comment -

        Digression...

        It really feals like the whole LazyDocument API and usage pattern – particularly the semantics in which "LazyDocument.getField(FieldInfo)" can be used – is really brittle and confusiong.

        It assumes it will be called exactly once for each StorableField that exists for that field name, in order because the internal counter is assumed to correspond exactly wit the array index of the corresponding array of values. (In short: it's really only useful when called from StoredFieldVisitor)

        If, for example, a client app tries to use the same LazyDocument instance to generate LazyField instances for the same field twice – it will most likeley get an ArrayIndexOutOfBoundsError. Likewise a user could read the jacadocs for LazyDocument.getField and assume that it works simlar to StorableDocument.getField expecting it to return a LazyField of the first field value for the specified field. They could wind up calling the method twice on the same field and getting lazy fields pointing at two differnet values (the first and second StorableField instances for that field name)

        none of which directly relates to this particular bug, but since this public class seems to have been added solely for solr, it may be worth considering deprecating it, and moving the logic solr needs into protected/private classes.

        ...as for the matter at hand...

        I reworked the internals of hte class a bit so that:

        • LazyField instances cache their underlying values similar to how lazy loading worked in 3.x
        • Asking for the value of a LazyField instance causes all LazyField's (produced by the same LazyDocument) with the same field name to be "actualized" in a single pass over the underlying document

        The changes eliminate the pathalogical case of iterating over many LazyFields, and shouldn't add much even in the edge case of only carrying about a single field value when there are many – it's still a single loop in the underlying Document of the fields, there's just no more short circut for hte single value case.

        In order for these changes to work, the LazDocument has to now keep a reference to every LazyField it generates so that they can all be repopulated when one of their filed name brothers is repopulated. The memory footprint of this shouldn't be too bad in the common case (afterall: the LazyDocument was already fetching the entire underlying Document keeping the whole thing around as long as there was a single LazyField in the heap) but just to be safe i made it track the list of LazyFields using weak refrences – so even if someone asks for a bunch of LazyField instance for a field named "foo", and then throws away all but a few of them, they can be garbage collected regardless of how many other "foo" LazyFields are still arround or when/if their real values are loaded.

        Patch includes a few test updates in solr to sanity check that lazy loading value caching works, but i definitely want to add some true LazyDocument unit tests to the module (there don't seem to be any at all at the moment) to exercise the "all lazy fields with the same name get populated at the same time" and "what if a weakref LazyField goes away" logic before i'd consider committing.


        Feedback would be greatly appreciated.

        Show
        Hoss Man added a comment - Digression... It really feals like the whole LazyDocument API and usage pattern – particularly the semantics in which "LazyDocument.getField(FieldInfo)" can be used – is really brittle and confusiong. It assumes it will be called exactly once for each StorableField that exists for that field name, in order because the internal counter is assumed to correspond exactly wit the array index of the corresponding array of values. (In short: it's really only useful when called from StoredFieldVisitor) If, for example, a client app tries to use the same LazyDocument instance to generate LazyField instances for the same field twice – it will most likeley get an ArrayIndexOutOfBoundsError. Likewise a user could read the jacadocs for LazyDocument.getField and assume that it works simlar to StorableDocument.getField expecting it to return a LazyField of the first field value for the specified field. They could wind up calling the method twice on the same field and getting lazy fields pointing at two differnet values (the first and second StorableField instances for that field name) none of which directly relates to this particular bug, but since this public class seems to have been added solely for solr, it may be worth considering deprecating it, and moving the logic solr needs into protected/private classes. ...as for the matter at hand... I reworked the internals of hte class a bit so that: LazyField instances cache their underlying values similar to how lazy loading worked in 3.x Asking for the value of a LazyField instance causes all LazyField's (produced by the same LazyDocument) with the same field name to be "actualized" in a single pass over the underlying document The changes eliminate the pathalogical case of iterating over many LazyFields, and shouldn't add much even in the edge case of only carrying about a single field value when there are many – it's still a single loop in the underlying Document of the fields, there's just no more short circut for hte single value case. In order for these changes to work, the LazDocument has to now keep a reference to every LazyField it generates so that they can all be repopulated when one of their filed name brothers is repopulated. The memory footprint of this shouldn't be too bad in the common case (afterall: the LazyDocument was already fetching the entire underlying Document keeping the whole thing around as long as there was a single LazyField in the heap) but just to be safe i made it track the list of LazyFields using weak refrences – so even if someone asks for a bunch of LazyField instance for a field named "foo", and then throws away all but a few of them, they can be garbage collected regardless of how many other "foo" LazyFields are still arround or when/if their real values are loaded. Patch includes a few test updates in solr to sanity check that lazy loading value caching works, but i definitely want to add some true LazyDocument unit tests to the module (there don't seem to be any at all at the moment) to exercise the "all lazy fields with the same name get populated at the same time" and "what if a weakref LazyField goes away" logic before i'd consider committing. Feedback would be greatly appreciated.
        Hide
        Hoss Man added a comment -

        updated patch

        in addition to adding more tests, i added logic so that when the underlying document is fetched to populate the lazy fields, only the specific fields needed will be loaded (so if a doc contains fields w, x, y, and z, and the client code fetches field w and requests that fields y and z be lazy loaded, then field x will never be read. and field w will not be fetched redundantly.

        would really appreciate feedback, i'd like to get this into th 4.2.1 bug fix release

        Show
        Hoss Man added a comment - updated patch in addition to adding more tests, i added logic so that when the underlying document is fetched to populate the lazy fields, only the specific fields needed will be loaded (so if a doc contains fields w, x, y, and z, and the client code fetches field w and requests that fields y and z be lazy loaded, then field x will never be read. and field w will not be fetched redundantly. would really appreciate feedback, i'd like to get this into th 4.2.1 bug fix release
        Hide
        Yonik Seeley added a comment -

        Still trying to wrap my head around how all this works... particularly with synchronization and why the old implementation, or the patch, are thread safe.

        I'm not sure we should be using weak references though. They could cause problems at the rate they could be generated.

        Show
        Yonik Seeley added a comment - Still trying to wrap my head around how all this works... particularly with synchronization and why the old implementation, or the patch, are thread safe. I'm not sure we should be using weak references though. They could cause problems at the rate they could be generated.
        Hide
        Hoss Man added a comment -

        Yonik Seeley: I can remove the weak references easy enough, but cna you elaborate on what concerns you have about the thread safety?

        can you give me an example of a sequence of (parallel) events that you think would be problematic so i can try to address it?

        Show
        Hoss Man added a comment - Yonik Seeley : I can remove the weak references easy enough, but cna you elaborate on what concerns you have about the thread safety? can you give me an example of a sequence of (parallel) events that you think would be problematic so i can try to address it?
        Hide
        Hoss Man added a comment -

        updated patch removing the weak refs and fixing a few doc typoes

        Show
        Hoss Man added a comment - updated patch removing the weak refs and fixing a few doc typoes
        Hide
        Yonik Seeley added a comment -

        I was initially worried about LazyDocument.getField, but I now realize that LazyDocument is not a "Document" (i.e. it is not what you will ever retrieve from the document cache). It's more of a LazyFieldSource

        I think there may still be an issue with LazyField.realValue though, in that it may return a partially constructed object (unless realValue is guaranteed to be effectively immutable). The easiest fix would be to make realValue volatile (and readers will thus cross a read memory barrier).

              // edge case: if someone asks this LazyDoc for more LazyFields
              // after other LazyFields from the same LazyDoc have been
              // actuallized, we need to force the doc to be re-fetched
              // so the new LazyFields are also populated.
              doc = null;
        

        Does this happen with Solr?

        Show
        Yonik Seeley added a comment - I was initially worried about LazyDocument.getField, but I now realize that LazyDocument is not a "Document" (i.e. it is not what you will ever retrieve from the document cache). It's more of a LazyFieldSource I think there may still be an issue with LazyField.realValue though, in that it may return a partially constructed object (unless realValue is guaranteed to be effectively immutable). The easiest fix would be to make realValue volatile (and readers will thus cross a read memory barrier). // edge case : if someone asks this LazyDoc for more LazyFields // after other LazyFields from the same LazyDoc have been // actuallized, we need to force the doc to be re-fetched // so the new LazyFields are also populated. doc = null ; Does this happen with Solr?
        Hide
        Hoss Man added a comment -

        The easiest fix would be to make realValue volatile (and readers will thus cross a read memory barrier).

        I don't fully understand what situation you are concerned about here, but making it volatile is easy .. running tests now.

        Does this happen with Solr?

        no, but the way the public API for LazyDocument is defined, it is possible (even though it wouldn't really make any sense given the way the API is really only usable from a StoredFieldVisitor) so i wanted to protect against it – i included test randomization to verify that the edge case works.

        Show
        Hoss Man added a comment - The easiest fix would be to make realValue volatile (and readers will thus cross a read memory barrier). I don't fully understand what situation you are concerned about here, but making it volatile is easy .. running tests now. Does this happen with Solr? no, but the way the public API for LazyDocument is defined, it is possible (even though it wouldn't really make any sense given the way the API is really only usable from a StoredFieldVisitor) so i wanted to protect against it – i included test randomization to verify that the edge case works.
        Hide
        Hoss Man added a comment -

        I switched to using volatile per yonik's suggestion, but in the process of testing i encountered a test failure that has me convinced that either i'm going insane, or there is some weird state info that gets corrupted if you attempt to use an IndexReader to fetch the stored field values of a doc while a StoredFieldVisitor is already being used to fetch those fields – and if i can encounter this kind of problem in a single threaded test, it scares me about any multi-threaded usages in any solr/lucene instance.

        The attached patch updates the test with a bunch of nocommit System.out's as well as a new "sanity" test where i tried to recreate the underlying problem w/o using LazyDocument at all...

        This is the first failure i encountered...

        ant test  -Dtestcase=TestLazyDocument -Dtests.method=testLazy -Dtests.seed=553F22658CE2D9A9 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Africa/Ndjamena -Dtests.file.encoding=UTF-8
        
        ...
        
        [junit4:junit4]    > Throwable #1: java.lang.AssertionError: fieldName count: b expected:<112> but was:<1229>
        [junit4:junit4]    > 	at __randomizedtesting.SeedInfo.seed([553F22658CE2D9A9:64E62F373CBE0D02]:0)
        [junit4:junit4]    > 	at org.junit.Assert.fail(Assert.java:93)
        [junit4:junit4]    > 	at org.junit.Assert.failNotEquals(Assert.java:647)
        [junit4:junit4]    > 	at org.junit.Assert.assertEquals(Assert.java:128)
        [junit4:junit4]    > 	at org.junit.Assert.assertEquals(Assert.java:472)
        [junit4:junit4]    > 	at org.apache.lucene.document.TestLazyDocument.testLazy(TestLazyDocument.java:170)
        ...
        

        you can see from the sysout calls prior to the failure that the "needsField" method of the LazyTestingStoredFieldVisitor seems to be getting called the correct number of times, but with the wrong field name – once it randomly calls assertNotNull(lazyDoc.getDocument()); (which calls reader.doc(int)) every successive call to needsField winds up refering to the next field name ("b") instead of all the other field names in the document.

        if you comment out the assertNotNull(lazyDoc.getDocument()); call, the test passes.

        This lead me to write testHossssSanity which doesn't use any of hte lazy loading code at all, it just uses a simple StoredFieldsVisitor whose needsField method always returns NO but also arbitrarily calls reader.document(doc) for no reason once we start getting to the first lettered field name...

        ant test  -Dtestcase=TestLazyDocument -Dtests.method=testHossssSanity -Dtests.seed=553F22658CE2D9A9 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Africa/Ndjamena -Dtests.file.encoding=UTF-8
        
        ...
        
        [junit4:junit4]    > Throwable #1: java.lang.AssertionError: fieldName count: b expected:<139> but was:<1528>
        [junit4:junit4]    > 	at __randomizedtesting.SeedInfo.seed([553F22658CE2D9A9:FE3621EFA9383F64]:0)
        [junit4:junit4]    > 	at org.junit.Assert.fail(Assert.java:93)
        [junit4:junit4]    > 	at org.junit.Assert.failNotEquals(Assert.java:647)
        [junit4:junit4]    > 	at org.junit.Assert.assertEquals(Assert.java:128)
        [junit4:junit4]    > 	at org.junit.Assert.assertEquals(Assert.java:472)
        [junit4:junit4]    > 	at org.apache.lucene.document.TestLazyDocument.testHossssSanity(TestLazyDocument.java:120)
        ...
        

        ...you can again see from the sysout calls that needsField is called the expected number of times, but always with field named "b" (the first field name to exist after the first field name that calls reader.document.

        What finally convinced me that i wasn't insane was when i found a diffenret seed that produced a different low error at a lower level...

        ant test  -Dtestcase=TestLazyDocument -Dtests.method=testHossssSanity -Dtests.seed=4C348A26E496AC38 -Dtests.slow=true -Dtests.locale=fr_CH -Dtests.timezone=Antarctica/Vostok -Dtests.file.encoding=UTF-8
        
        ...
        
        [junit4:junit4]    > Throwable #1: java.lang.AssertionError: bits=5f
        [junit4:junit4]    > 	at __randomizedtesting.SeedInfo.seed([4C348A26E496AC38:E73D89ACC14C4AF5]:0)
        [junit4:junit4]    > 	at org.apache.lucene.codecs.lucene40.Lucene40StoredFieldsReader.visitDocument(Lucene40StoredFieldsReader.java:155)
        [junit4:junit4]    > 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:139)
        [junit4:junit4]    > 	at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:116)
        [junit4:junit4]    > 	at org.apache.lucene.document.TestLazyDocument.testHossssSanity(TestLazyDocument.java:118)
        ...
        

        ...which suggests to me that maybe there really is a bug in the underlying StoredFieldsVisitor logic (or perhaps just in the Lucene40StoredFieldsReader) ?

        Show
        Hoss Man added a comment - I switched to using volatile per yonik's suggestion, but in the process of testing i encountered a test failure that has me convinced that either i'm going insane, or there is some weird state info that gets corrupted if you attempt to use an IndexReader to fetch the stored field values of a doc while a StoredFieldVisitor is already being used to fetch those fields – and if i can encounter this kind of problem in a single threaded test, it scares me about any multi-threaded usages in any solr/lucene instance. The attached patch updates the test with a bunch of nocommit System.out's as well as a new "sanity" test where i tried to recreate the underlying problem w/o using LazyDocument at all... This is the first failure i encountered... ant test -Dtestcase=TestLazyDocument -Dtests.method=testLazy -Dtests.seed=553F22658CE2D9A9 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Africa/Ndjamena -Dtests.file.encoding=UTF-8 ... [junit4:junit4] > Throwable #1: java.lang.AssertionError: fieldName count: b expected:<112> but was:<1229> [junit4:junit4] > at __randomizedtesting.SeedInfo.seed([553F22658CE2D9A9:64E62F373CBE0D02]:0) [junit4:junit4] > at org.junit.Assert.fail(Assert.java:93) [junit4:junit4] > at org.junit.Assert.failNotEquals(Assert.java:647) [junit4:junit4] > at org.junit.Assert.assertEquals(Assert.java:128) [junit4:junit4] > at org.junit.Assert.assertEquals(Assert.java:472) [junit4:junit4] > at org.apache.lucene.document.TestLazyDocument.testLazy(TestLazyDocument.java:170) ... you can see from the sysout calls prior to the failure that the "needsField" method of the LazyTestingStoredFieldVisitor seems to be getting called the correct number of times, but with the wrong field name – once it randomly calls assertNotNull(lazyDoc.getDocument()); (which calls reader.doc(int)) every successive call to needsField winds up refering to the next field name ("b") instead of all the other field names in the document. if you comment out the assertNotNull(lazyDoc.getDocument()); call, the test passes. This lead me to write testHossssSanity which doesn't use any of hte lazy loading code at all, it just uses a simple StoredFieldsVisitor whose needsField method always returns NO but also arbitrarily calls reader.document(doc) for no reason once we start getting to the first lettered field name... ant test -Dtestcase=TestLazyDocument -Dtests.method=testHossssSanity -Dtests.seed=553F22658CE2D9A9 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Africa/Ndjamena -Dtests.file.encoding=UTF-8 ... [junit4:junit4] > Throwable #1: java.lang.AssertionError: fieldName count: b expected:<139> but was:<1528> [junit4:junit4] > at __randomizedtesting.SeedInfo.seed([553F22658CE2D9A9:FE3621EFA9383F64]:0) [junit4:junit4] > at org.junit.Assert.fail(Assert.java:93) [junit4:junit4] > at org.junit.Assert.failNotEquals(Assert.java:647) [junit4:junit4] > at org.junit.Assert.assertEquals(Assert.java:128) [junit4:junit4] > at org.junit.Assert.assertEquals(Assert.java:472) [junit4:junit4] > at org.apache.lucene.document.TestLazyDocument.testHossssSanity(TestLazyDocument.java:120) ... ...you can again see from the sysout calls that needsField is called the expected number of times, but always with field named "b" (the first field name to exist after the first field name that calls reader.document. What finally convinced me that i wasn't insane was when i found a diffenret seed that produced a different low error at a lower level... ant test -Dtestcase=TestLazyDocument -Dtests.method=testHossssSanity -Dtests.seed=4C348A26E496AC38 -Dtests.slow=true -Dtests.locale=fr_CH -Dtests.timezone=Antarctica/Vostok -Dtests.file.encoding=UTF-8 ... [junit4:junit4] > Throwable #1: java.lang.AssertionError: bits=5f [junit4:junit4] > at __randomizedtesting.SeedInfo.seed([4C348A26E496AC38:E73D89ACC14C4AF5]:0) [junit4:junit4] > at org.apache.lucene.codecs.lucene40.Lucene40StoredFieldsReader.visitDocument(Lucene40StoredFieldsReader.java:155) [junit4:junit4] > at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:139) [junit4:junit4] > at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:116) [junit4:junit4] > at org.apache.lucene.document.TestLazyDocument.testHossssSanity(TestLazyDocument.java:118) ... ...which suggests to me that maybe there really is a bug in the underlying StoredFieldsVisitor logic (or perhaps just in the Lucene40StoredFieldsReader) ?
        Hide
        Hoss Man added a comment -

        Ok ... dug into SegmentReader a bit and confirmed that under the covers SegmentCoreReaders is using ThreadLocal to track a stored StoredFieldsReader instance ... so there doesn't seem to be a a true thread safety problem here (whew), you just have to be careful as a StoredFieldVistor writer not to try and do any other stored field visiting inside your visitor callbacks or you'll corrupt internal state.

        so i need to change my testLazy to be less evil and not do that.

        Show
        Hoss Man added a comment - Ok ... dug into SegmentReader a bit and confirmed that under the covers SegmentCoreReaders is using ThreadLocal to track a stored StoredFieldsReader instance ... so there doesn't seem to be a a true thread safety problem here (whew), you just have to be careful as a StoredFieldVistor writer not to try and do any other stored field visiting inside your visitor callbacks or you'll corrupt internal state. so i need to change my testLazy to be less evil and not do that.
        Hide
        Hoss Man added a comment -

        Ok ... made the test less evil so it doesn't trip over the StoredFieldsReader threadlocal, but modified it so it still excercises the possibility that a client might use the LazyDoc to ask for additional LazyField instances after already actualizing other instances

        I'm still hammering on the tests, but i'd appreciate review.

        Show
        Hoss Man added a comment - Ok ... made the test less evil so it doesn't trip over the StoredFieldsReader threadlocal, but modified it so it still excercises the possibility that a client might use the LazyDoc to ask for additional LazyField instances after already actualizing other instances I'm still hammering on the tests, but i'd appreciate review.
        Hide
        Hoss Man added a comment -

        Sigh .. this time with nocommits removed for real.

        Show
        Hoss Man added a comment - Sigh .. this time with nocommits removed for real.
        Hide
        Michael McCandless added a comment -

        Ok ... dug into SegmentReader a bit and confirmed that under the covers SegmentCoreReaders is using ThreadLocal to track a stored StoredFieldsReader instance

        Ahh that's right ... so, the IR.document(int, visitor) is non-reentrant: your visitor cannot go and call IR.document again ... I think we should at least update the javadocs for StoredFieldVisitor to explain this? I'll do that...

        Show
        Michael McCandless added a comment - Ok ... dug into SegmentReader a bit and confirmed that under the covers SegmentCoreReaders is using ThreadLocal to track a stored StoredFieldsReader instance Ahh that's right ... so, the IR.document(int, visitor) is non-reentrant: your visitor cannot go and call IR.document again ... I think we should at least update the javadocs for StoredFieldVisitor to explain this? I'll do that...
        Hide
        Hoss Man added a comment -

        Fixed on trunk, working on backport now...

        Committed revision 1458887.

        Show
        Hoss Man added a comment - Fixed on trunk, working on backport now... Committed revision 1458887.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1458887

        SOLR-4589: Fixed CPU spikes and poor performance in lazy field loading of multivalued fields

        Show
        Commit Tag Bot added a comment - [trunk commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1458887 SOLR-4589 : Fixed CPU spikes and poor performance in lazy field loading of multivalued fields
        Hide
        Hoss Man added a comment -

        Committed revision 1458938.
        Committed revision 1458959.
        h

        Show
        Hoss Man added a comment - Committed revision 1458938. Committed revision 1458959. h
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Chris M. Hostetter
        http://svn.apache.org/viewvc?view=revision&revision=1458938

        SOLR-4589: Fixed CPU spikes and poor performance in lazy field loading of multivalued fields (merge r1458887)

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1458938 SOLR-4589 : Fixed CPU spikes and poor performance in lazy field loading of multivalued fields (merge r1458887)
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development