Lucene - Core
  1. Lucene - Core
  2. LUCENE-1791

Enhance QueryUtils and CheckHIts to wrap everything they check in MultiReader/MultiSearcher

    Details

    • Type: Test Test
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      methods in CheckHits & QueryUtils are in a good position to take any Searcher they are given and not only test it, but also test MultiReader & MultiSearcher constructs built around them

      1. LUCENE-1791.patch
        15 kB
        Mark Miller
      2. LUCENE-1791.patch
        14 kB
        Hoss Man
      3. LUCENE-1791.patch
        17 kB
        Mark Miller
      4. LUCENE-1791.patch
        18 kB
        Mark Miller
      5. LUCENE-1791.patch
        18 kB
        Mark Miller
      6. LUCENE-1791.patch
        9 kB
        Hoss Man
      7. LUCENE-1791.patch
        7 kB
        Hoss Man
      8. LUCENE-1791.patch
        7 kB
        Mark Miller
      9. LUCENE-1791.patch
        6 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Patch showing what i have in mind.

          Current patch causes 14 failures in TestComplexExplanations, all of which stem from calling checkFirstSkipTo() on IndexSearcher created by wrapUnderlyingReader().

          The failure is QueryUtils.java:305: "unstable skipTo(0) score! expected:<NaN> but was:<NaN>" which seems like it could easily be a bug introduced by the patch, except...

          • Why only cause 14/22 in TestComplexExplanations to fail?
          • Why wouldn't the original IndexSearcher have produced a NaN score?
          Show
          Hoss Man added a comment - Patch showing what i have in mind. Current patch causes 14 failures in TestComplexExplanations, all of which stem from calling checkFirstSkipTo() on IndexSearcher created by wrapUnderlyingReader(). The failure is QueryUtils.java:305: "unstable skipTo(0) score! expected:<NaN> but was:<NaN>" which seems like it could easily be a bug introduced by the patch, except... Why only cause 14/22 in TestComplexExplanations to fail? Why wouldn't the original IndexSearcher have produced a NaN score?
          Hide
          Hoss Man added a comment -

          FWIW: beyond just fixing the current failures, i'd like to enhance this patch (possibly using some sort of static cache of the wrapped readers it creates) so that it can work well in conjunction with LUCENE-1749.

          At the moment, the wrapping readers are go out of scope and are likely to be GCed well before anything interesting happens with the original IndexSearcher, so sanity checking the field cache isn't likely to find anything suspicious.

          Show
          Hoss Man added a comment - FWIW: beyond just fixing the current failures, i'd like to enhance this patch (possibly using some sort of static cache of the wrapped readers it creates) so that it can work well in conjunction with LUCENE-1749 . At the moment, the wrapping readers are go out of scope and are likely to be GCed well before anything interesting happens with the original IndexSearcher, so sanity checking the field cache isn't likely to find anything suspicious.
          Hide
          Hoss Man added a comment -

          I just retried this patch against the trunk now that the FieldCacheSanityChecker and some other patches have been committed. In addition to the possibly false-negatives from TestComplexExplanation (NaN score) this is now surfacing FielCache sanity failures from TestCustomScoreQuery, TestFieldScoreQuery, and TestOrdValues (suggesting that there are code paths where those query types don't correctly use the subreaders to get the FieldCache) as well as checkFirstSkipTo() failures for TestSpansAdvanced2 and an ArrayIndexOutOfBoundsException from TestBoolean2.testRandomQueries (grr.... it uses it's own random so no seed was logged)

          I don't pretend this patch is perfect, but i can't imagine these are all false-negatives.

          We should get to the bottom of this before 2.9. I'll start trying to figure it out on the train tonight.

          Show
          Hoss Man added a comment - I just retried this patch against the trunk now that the FieldCacheSanityChecker and some other patches have been committed. In addition to the possibly false-negatives from TestComplexExplanation (NaN score) this is now surfacing FielCache sanity failures from TestCustomScoreQuery, TestFieldScoreQuery, and TestOrdValues (suggesting that there are code paths where those query types don't correctly use the subreaders to get the FieldCache) as well as checkFirstSkipTo() failures for TestSpansAdvanced2 and an ArrayIndexOutOfBoundsException from TestBoolean2.testRandomQueries (grr.... it uses it's own random so no seed was logged) I don't pretend this patch is perfect, but i can't imagine these are all false-negatives. We should get to the bottom of this before 2.9. I'll start trying to figure it out on the train tonight.
          Hide
          Hoss Man added a comment -

          (grr.... it uses it's own random so no seed was logged)

          Correction: it does log the seed, i was just looking for stderr when i should have been looking for stdout...

          failed query: +field:w2 field:w3 field:xx field:w4 field:w2
          NOTE: random seed of testcase 'testRandomQueries' was: 5695251427490718890
          
          Show
          Hoss Man added a comment - (grr.... it uses it's own random so no seed was logged) Correction: it does log the seed, i was just looking for stderr when i should have been looking for stdout... failed query: +field:w2 field:w3 field:xx field:w4 field:w2 NOTE: random seed of testcase 'testRandomQueries' was: 5695251427490718890
          Hide
          Mark Miller added a comment - - edited

          I'm guess the NAN failures are not a problem - looks like they fail because NAN != NAN ? Havn't looked closer.

          I don't think the fieldcache insanity is multi-reader related - it looks to me like some entries have a parser, and some null for the parser, even though the default parser is being used in both cases. The FieldSource types grab a FieldCache and may pass null as the parser, which ends up putting null in the cache entry - but if you specifically ask for the default parser, that puts the default parser in the fieldcache entry - same stuff now, doubled entry.

          As for the out of bounds - havn't look at that one yet - odd one ...

          ... interesting - it alternates between nullpointer and out of bounds exceptions ...

          Show
          Mark Miller added a comment - - edited I'm guess the NAN failures are not a problem - looks like they fail because NAN != NAN ? Havn't looked closer. I don't think the fieldcache insanity is multi-reader related - it looks to me like some entries have a parser, and some null for the parser, even though the default parser is being used in both cases. The FieldSource types grab a FieldCache and may pass null as the parser, which ends up putting null in the cache entry - but if you specifically ask for the default parser, that puts the default parser in the fieldcache entry - same stuff now, doubled entry. As for the out of bounds - havn't look at that one yet - odd one ... ... interesting - it alternates between nullpointer and out of bounds exceptions ...
          Hide
          Mark Miller added a comment -

          I don't think the fieldcache insanity is multi-reader related - it looks to me like some entries have a parser, and some null for the parser, even though the default parser is being used in both cases. The FieldSource types grab a FieldCache and may pass null as the parser, which ends up putting null in the cache entry - but if you specifically ask for the default parser, that puts the default parser in the fieldcache entry - same stuff now, doubled entry.

          Well that explains half the output anyway - even if thats fixed there is still a fail. Its because the tests doesn't expand fully into the subreaders - just needed the top level before - with this test, we need to recursively grab them.

          Show
          Mark Miller added a comment - I don't think the fieldcache insanity is multi-reader related - it looks to me like some entries have a parser, and some null for the parser, even though the default parser is being used in both cases. The FieldSource types grab a FieldCache and may pass null as the parser, which ends up putting null in the cache entry - but if you specifically ask for the default parser, that puts the default parser in the fieldcache entry - same stuff now, doubled entry. Well that explains half the output anyway - even if thats fixed there is still a fail. Its because the tests doesn't expand fully into the subreaders - just needed the top level before - with this test, we need to recursively grab them.
          Hide
          Hoss Man added a comment -

          I'm guess the NAN failures are not a problem - looks like they fail because NAN != NAN?

          right – but why would the scores be NaN when wrapped in a MultiReader? when it's not wrapped in a MultiReader the test passes, so the scores must not be NaN in that case.

          I don't think the fieldcache insanity is multi-reader related [...] same stuff now, doubled entry.

          The sanity checker ignores when two CacheEntries differ only by parser (precisely because of the the null/default parser issue) and the resulting value object is the same. but it does include all related CacheEntry objects in an Insanity object so that you have them all for debugging.

          Looking at TestCustomScoreQuery.testCustomScoreByte (for example)...

          *** BEGIN org.apache.lucene.search.function.TestCustomScoreQuery.testCustomScoreByte: Insane FieldCache usage(s) ***
          SUBREADER: Found caches for decendents of org.apache.lucene.index.DirectoryReader@88d2ae+iii
          	'org.apache.lucene.index.DirectoryReader@88d2ae'=>'iii',byte,null=>[B#841343 (size =~ 33 bytes)
          	'org.apache.lucene.index.DirectoryReader@88d2ae'=>'iii',byte,org.apache.lucene.search.FieldCache.DEFAULT_BYTE_PARSER=>[B#841343 (size =~ 33 bytes)
          	'org.apache.lucene.index.CompoundFileReader$CSIndexInput@77daaa'=>'iii',byte,org.apache.lucene.search.FieldCache.DEFAULT_BYTE_PARSER=>[B#981898 (size =~ 33 bytes)
          	'org.apache.lucene.index.CompoundFileReader$CSIndexInput@77daaa'=>'iii',byte,null=>[B#981898 (size =~ 33 bytes)
          
          *** END org.apache.lucene.search.function.TestCustomScoreQuery.testCustomScoreByte: Insane FieldCache usage(s) ***
          
          

          The insanity type is "SUBREADER", so it's specificly identified a problem with that type of relationship. There are 4 CacheEntries listed in the error all from the same field, but from two different readers. If you note the value identity hashcodes (just before the size estimate) each reader has only one value cached for that field (with different parsers) which is why there isn't a seperate error about the multiple values.
          as the first line of hte Instanity.toString() states: what it found is that DirectoryReader@88d2ae and at least one of it's decendents both have cached entires for the same field.

          Show
          Hoss Man added a comment - I'm guess the NAN failures are not a problem - looks like they fail because NAN != NAN? right – but why would the scores be NaN when wrapped in a MultiReader? when it's not wrapped in a MultiReader the test passes, so the scores must not be NaN in that case. I don't think the fieldcache insanity is multi-reader related [...] same stuff now, doubled entry. The sanity checker ignores when two CacheEntries differ only by parser (precisely because of the the null/default parser issue) and the resulting value object is the same. but it does include all related CacheEntry objects in an Insanity object so that you have them all for debugging. Looking at TestCustomScoreQuery.testCustomScoreByte (for example)... *** BEGIN org.apache.lucene.search.function.TestCustomScoreQuery.testCustomScoreByte: Insane FieldCache usage(s) *** SUBREADER: Found caches for decendents of org.apache.lucene.index.DirectoryReader@88d2ae+iii 'org.apache.lucene.index.DirectoryReader@88d2ae'=>'iii', byte , null =>[B#841343 (size =~ 33 bytes) 'org.apache.lucene.index.DirectoryReader@88d2ae'=>'iii', byte ,org.apache.lucene.search.FieldCache.DEFAULT_BYTE_PARSER=>[B#841343 (size =~ 33 bytes) 'org.apache.lucene.index.CompoundFileReader$CSIndexInput@77daaa'=>'iii', byte ,org.apache.lucene.search.FieldCache.DEFAULT_BYTE_PARSER=>[B#981898 (size =~ 33 bytes) 'org.apache.lucene.index.CompoundFileReader$CSIndexInput@77daaa'=>'iii', byte , null =>[B#981898 (size =~ 33 bytes) *** END org.apache.lucene.search.function.TestCustomScoreQuery.testCustomScoreByte: Insane FieldCache usage(s) *** The insanity type is "SUBREADER", so it's specificly identified a problem with that type of relationship. There are 4 CacheEntries listed in the error all from the same field, but from two different readers. If you note the value identity hashcodes (just before the size estimate) each reader has only one value cached for that field (with different parsers) which is why there isn't a seperate error about the multiple values. as the first line of hte Instanity.toString() states: what it found is that DirectoryReader@88d2ae and at least one of it's decendents both have cached entires for the same field.
          Hide
          Mark Miller added a comment -

          just fully rolling out to all of the subreaders makes the test pass I believe.

          Show
          Mark Miller added a comment - just fully rolling out to all of the subreaders makes the test pass I believe.
          Hide
          Hoss Man added a comment -

          Well that explains half the output anyway - even if thats fixed there is still a fail. Its because the tests doesn't expand fully into the subreaders - just needed the top level before - with this test, we need to recursively grab them.

          You lost me there... are you saying the tests needs to be changed? ... why?

          For this patch to trigger an error in an existing test, that test must either be using CheckHits or QueryUtils to execute a query against a seracher and validate the results are ok ... why would the test be responsible for any subreader expansion in this case?

          Show
          Hoss Man added a comment - Well that explains half the output anyway - even if thats fixed there is still a fail. Its because the tests doesn't expand fully into the subreaders - just needed the top level before - with this test, we need to recursively grab them. You lost me there... are you saying the tests needs to be changed? ... why? For this patch to trigger an error in an existing test, that test must either be using CheckHits or QueryUtils to execute a query against a seracher and validate the results are ok ... why would the test be responsible for any subreader expansion in this case?
          Hide
          Hoss Man added a comment -

          midair collision (x2) ... i think i see what you mean in your revised patch ... the tests don't need changed, it's just the test utility methods that were trying to recurse the readers that weren't doing the entire job.

          Show
          Hoss Man added a comment - midair collision (x2) ... i think i see what you mean in your revised patch ... the tests don't need changed, it's just the test utility methods that were trying to recurse the readers that weren't doing the entire job.
          Hide
          Mark Miller added a comment -

          Okay - so the first the original Parser issue:

          The output looked odd - it showed multiple entries with the same type, field, and reader, but null and DefaultParser.

          When I quickly switched the code to use DefaultParser instead of null, those extra entries went away, and it just showed the segmentreader and directory reader that were doubled up. Thats what had me thinking that was involved. I'm not sure I understand why that was happening now though.

          Anyway, all of the issues appear to be because the test code was written expecting all of the readers to be top level.

          The tests, in certain cases, use a Reader to grab from a fieldcache - that reader has to be the right subreader and not the top level reader -

          the tests were just using getSequentialSubReaders - they need use gatherSubReaders instead - because you introduced the multi level reader stuff. They should have used gatherSubReaders from the start, but because it wasn't needed at the time (for the tests to pass), it didn't even occur to me.

          Show
          Mark Miller added a comment - Okay - so the first the original Parser issue: The output looked odd - it showed multiple entries with the same type, field, and reader, but null and DefaultParser. When I quickly switched the code to use DefaultParser instead of null, those extra entries went away, and it just showed the segmentreader and directory reader that were doubled up. Thats what had me thinking that was involved. I'm not sure I understand why that was happening now though. Anyway, all of the issues appear to be because the test code was written expecting all of the readers to be top level. The tests, in certain cases, use a Reader to grab from a fieldcache - that reader has to be the right subreader and not the top level reader - the tests were just using getSequentialSubReaders - they need use gatherSubReaders instead - because you introduced the multi level reader stuff. They should have used gatherSubReaders from the start, but because it wasn't needed at the time (for the tests to pass), it didn't even occur to me.
          Hide
          Hoss Man added a comment -

          FYI: with mark's updated path, we're back to just the NaN failures from TestComplexExplanations. (and possibly TestBoolean2.testRandomQueries, but i can't confirm that)

          Show
          Hoss Man added a comment - FYI: with mark's updated path, we're back to just the NaN failures from TestComplexExplanations. (and possibly TestBoolean2.testRandomQueries, but i can't confirm that)
          Hide
          Mark Miller added a comment -

          I only get the NAN issue showing up now.

          I don't know why its returning NAN as the score at the moment, but as far as the test is concerned, that particular failure appears to be false. Its just checking that two calls to score return the same things - but if it returns NAN, they are not considered equal.

          So I guess the question is - does it make sense that the score is NAN?

          Show
          Mark Miller added a comment - I only get the NAN issue showing up now. I don't know why its returning NAN as the score at the moment, but as far as the test is concerned, that particular failure appears to be false. Its just checking that two calls to score return the same things - but if it returns NAN, they are not considered equal. So I guess the question is - does it make sense that the score is NAN?
          Hide
          Hoss Man added a comment -

          I figured out the problem with TestComplexExplanations ... the test uses a searcher with a Custom Similarity, and the new code wasn't setting the same Similarity on the new Searcher & MultiSearcher being created.

          Show
          Hoss Man added a comment - I figured out the problem with TestComplexExplanations ... the test uses a searcher with a Custom Similarity, and the new code wasn't setting the same Similarity on the new Searcher & MultiSearcher being created.
          Hide
          Hoss Man added a comment -

          one other thing i experimented with earlier today was making the "wrapped" MultiReaders and MultiSearchers a little more complicated (deeper structures, and adding some deleted docs to the other subreaders)

          at first i ran into problems because the deleted docs would through off the docIds of real docs (and some tests expected certain docs to have specific IDs) but even when the deleted docs only appear "after" the real docs, there are still some test failures with this latest patch.

          TestSimpleExplanations and TestComplexExplanations are the ones i've been looking at but there may be others (my laptop gets to hot if i try to run the full test suite too often)

          my head hurts trying to figure out why the deeper nested structures and deleted docs might be causing these new failures ... i'll try to look at it tomorow.

          Show
          Hoss Man added a comment - one other thing i experimented with earlier today was making the "wrapped" MultiReaders and MultiSearchers a little more complicated (deeper structures, and adding some deleted docs to the other subreaders) at first i ran into problems because the deleted docs would through off the docIds of real docs (and some tests expected certain docs to have specific IDs) but even when the deleted docs only appear "after" the real docs, there are still some test failures with this latest patch. TestSimpleExplanations and TestComplexExplanations are the ones i've been looking at but there may be others (my laptop gets to hot if i try to run the full test suite too often) my head hurts trying to figure out why the deeper nested structures and deleted docs might be causing these new failures ... i'll try to look at it tomorow.
          Hide
          Simon Willnauer added a comment -

          I just looked at the patch - one little minor thing. Would it make sense to randomized reader / searcher indexes in wrapUnderlyingReader / wrapSearchers to catch possible array index related issues? This could also be done for the num of deleted terms and types of searcher / readers.

          simon

          Show
          Simon Willnauer added a comment - I just looked at the patch - one little minor thing. Would it make sense to randomized reader / searcher indexes in wrapUnderlyingReader / wrapSearchers to catch possible array index related issues? This could also be done for the num of deleted terms and types of searcher / readers. simon
          Hide
          Mark Miller added a comment -

          Have you started working on the ItemizedFilter at all? It can't use outside doc ids like that. Saying to match doc 3 will match every doc 3 in every reader.

          Show
          Mark Miller added a comment - Have you started working on the ItemizedFilter at all? It can't use outside doc ids like that. Saying to match doc 3 will match every doc 3 in every reader.
          Hide
          Mark Miller added a comment -

          I guess the out of bounds exception is still there:

          [junit] Testsuite: org.apache.lucene.search.TestBoolean2
          [junit] Tests run: 11, Failures: 0, Errors: 1, Time elapsed: 0.723 sec
          [junit] ------------- Standard Output ---------------
          [junit] failed query: field:yy field:zzz
          [junit] NOTE: random seed of testcase 'testRandomQueries' was: -2696660952902803926
          [junit] ------------- ---------------- ---------------
          [junit] Testcase: testRandomQueries(org.apache.lucene.search.TestBoolean2): Caused an ERROR
          [junit] -1
          [junit] java.lang.ArrayIndexOutOfBoundsException: -1

          No sure what to do about the ItemizedFilter - its not really a legal way to check things ...

          Show
          Mark Miller added a comment - I guess the out of bounds exception is still there: [junit] Testsuite: org.apache.lucene.search.TestBoolean2 [junit] Tests run: 11, Failures: 0, Errors: 1, Time elapsed: 0.723 sec [junit] ------------- Standard Output --------------- [junit] failed query: field:yy field:zzz [junit] NOTE: random seed of testcase 'testRandomQueries' was: -2696660952902803926 [junit] ------------- ---------------- --------------- [junit] Testcase: testRandomQueries(org.apache.lucene.search.TestBoolean2): Caused an ERROR [junit] -1 [junit] java.lang.ArrayIndexOutOfBoundsException: -1 No sure what to do about the ItemizedFilter - its not really a legal way to check things ...
          Hide
          Mark Miller added a comment -

          Okay, I've got all tests passing.

          I rigged up a smarted ItemizedFilter that works for these tests.

          Show
          Mark Miller added a comment - Okay, I've got all tests passing. I rigged up a smarted ItemizedFilter that works for these tests.
          Hide
          Mark Miller added a comment -

          every so slight cleaned up

          removed a couple unused imports
          added a comment about whats going on with the ItemizedFilter

          Show
          Mark Miller added a comment - every so slight cleaned up removed a couple unused imports added a comment about whats going on with the ItemizedFilter
          Hide
          Hoss Man added a comment -

          Okay, I've got all tests passing.

          Sweet! ... Even the random boolean query test? (i see changes to ReqExclScorer in your patch, but i don't relaly see how that could solve the problem)

          I rigged up a smarted ItemizedFilter that works for these tests.

          I'm clearly missing something ... why does ItemizedFilter need to change? I know it's an abomination and you would never want to use something like that in real code, but in this test the docIds passed to it on construction shouldn't change – even when we wrap the reader/searcher in other searchers, i specificly only put empty indexes (with no deletions) in front of the original index when bulding the multireader/searcher so the docIds will be the same. why doesn't the existing implementation work?

          Would it make sense to randomized reader / searcher indexes

          Simon: randomized wrapping would be nice, but i didn't try to do that for two reasons:

          • the utility has no way to get the seed from LuceneTestCase, and i didn't want to introduce randomness unless it was predictable randomness that could be logged.
          • i needed to be sure we didn't add an index with deletions prior to the original index (but based on Mark's comments, it looks like that's not really an issue since we have to deal with it anyway)
          Show
          Hoss Man added a comment - Okay, I've got all tests passing. Sweet! ... Even the random boolean query test? (i see changes to ReqExclScorer in your patch, but i don't relaly see how that could solve the problem) I rigged up a smarted ItemizedFilter that works for these tests. I'm clearly missing something ... why does ItemizedFilter need to change? I know it's an abomination and you would never want to use something like that in real code, but in this test the docIds passed to it on construction shouldn't change – even when we wrap the reader/searcher in other searchers, i specificly only put empty indexes (with no deletions) in front of the original index when bulding the multireader/searcher so the docIds will be the same. why doesn't the existing implementation work? Would it make sense to randomized reader / searcher indexes Simon: randomized wrapping would be nice, but i didn't try to do that for two reasons: the utility has no way to get the seed from LuceneTestCase, and i didn't want to introduce randomness unless it was predictable randomness that could be logged. i needed to be sure we didn't add an index with deletions prior to the original index (but based on Mark's comments, it looks like that's not really an issue since we have to deal with it anyway)
          Hide
          Mark Miller added a comment -

          Sweet! ... Even the random boolean query test? (i see changes to ReqExclScorer in your patch, but i don't relaly see how that could solve the problem)

          Yes - the issue was again how it was dealing with the sub readers - my original fix was not fully correct - it just happened to work with the previous tests.

          By the way - any changes to non tests should have been pulled - sorry - I'll put up another patch.

          I'm clearly missing something ... why does ItemizedFilter need to change? I know it's an abomination and you would never want to use something like that in real code, but in this test the docIds passed to it on construction shouldn't change - even when we wrap the reader/searcher in other searchers, i specificly only put empty indexes (with no deletions) in front of the original index when bulding the multireader/searcher so the docIds will be the same. why doesn't the existing implementation work?

          Because it uses external ids - that only works if you have a single reader. The filter is applied per sub-reader - so if you use a filter that just sets doc 0 - every sub reader will match its first doc. You could do this in the past, because it just worked on the top level reader - in that case, external ids match completely against the reader - but with the sub-reader approach, each sub-reader has an id space that starts at 0.

          Show
          Mark Miller added a comment - Sweet! ... Even the random boolean query test? (i see changes to ReqExclScorer in your patch, but i don't relaly see how that could solve the problem) Yes - the issue was again how it was dealing with the sub readers - my original fix was not fully correct - it just happened to work with the previous tests. By the way - any changes to non tests should have been pulled - sorry - I'll put up another patch. I'm clearly missing something ... why does ItemizedFilter need to change? I know it's an abomination and you would never want to use something like that in real code, but in this test the docIds passed to it on construction shouldn't change - even when we wrap the reader/searcher in other searchers, i specificly only put empty indexes (with no deletions) in front of the original index when bulding the multireader/searcher so the docIds will be the same. why doesn't the existing implementation work? Because it uses external ids - that only works if you have a single reader. The filter is applied per sub-reader - so if you use a filter that just sets doc 0 - every sub reader will match its first doc. You could do this in the past, because it just worked on the top level reader - in that case, external ids match completely against the reader - but with the sub-reader approach, each sub-reader has an id space that starts at 0.
          Hide
          Hoss Man added a comment -

          The filter is applied per sub-reader

          Doh! .. right, that's the part i was forgetting – filters are another thing that happen per reader now.

          sweet. i'll review the patch for real tomorow or sat, and maybe rip out the ItemizedFilter and replace it with something more sensical (it's a bad example that people might stumble upon, that seems even more confusing now, and i feel responsible since i'm the one that wrote it for those explanation tests anyway)

          Show
          Hoss Man added a comment - The filter is applied per sub-reader Doh! .. right, that's the part i was forgetting – filters are another thing that happen per reader now. sweet. i'll review the patch for real tomorow or sat, and maybe rip out the ItemizedFilter and replace it with something more sensical (it's a bad example that people might stumble upon, that seems even more confusing now, and i feel responsible since i'm the one that wrote it for those explanation tests anyway)
          Hide
          Mark Miller added a comment -

          sweet. i'll review the patch for real tomorow or sat, and maybe rip out the ItemizedFilter and replace it with something more sensical (it's a bad example that people might stumble upon, that seems even more confusing now, and i feel responsible since i'm the one that wrote it for those explanation tests anyway)

          Cool - it is very confusing now, and not super clean really - its kind of a hack - I just wasn't sure how else to handle it.

          Essentially, every time it sees a reader, it assumes its a reader thats part of a sequence of sub readers and adds maxDoc to a base to figure out what doc ids it should really set - and then if it sees the top level reader, or the the first sub reader, it resets that base - so it can start over if the set of sub readers need to come through again. It works for these tests, but it makes no sense to do such a thing in the real world.

          Show
          Mark Miller added a comment - sweet. i'll review the patch for real tomorow or sat, and maybe rip out the ItemizedFilter and replace it with something more sensical (it's a bad example that people might stumble upon, that seems even more confusing now, and i feel responsible since i'm the one that wrote it for those explanation tests anyway) Cool - it is very confusing now, and not super clean really - its kind of a hack - I just wasn't sure how else to handle it. Essentially, every time it sees a reader, it assumes its a reader thats part of a sequence of sub readers and adds maxDoc to a base to figure out what doc ids it should really set - and then if it sees the top level reader, or the the first sub reader, it resets that base - so it can start over if the set of sub readers need to come through again. It works for these tests, but it makes no sense to do such a thing in the real world.
          Hide
          Hoss Man added a comment -

          i put the doc "ids" into a KEY field and refactored ItemizedFilter to be a trivial subclass of FieldCacheTermsFilter.

          I also added more wrap permutations to address some of the possible edge cases Simon pointed out (good catch SImon) but didn't introduce any randomization for hte reasons mentioned before (even with the change to not rely on consistent docIds in ItemizedFilter, we can't allow deletions before the wrapped searcher/reader because CheckHIts does it magic based on docIds.

          (hmm... i suppose the wrap functions could return some metadata about what offset the old ids have in the new search/reader and CheckHits could use that .... hmmm ... seems kludgy so i'm not going to worry about it)

          I think we're good to go here unless anyone has any objections

          Show
          Hoss Man added a comment - i put the doc "ids" into a KEY field and refactored ItemizedFilter to be a trivial subclass of FieldCacheTermsFilter. I also added more wrap permutations to address some of the possible edge cases Simon pointed out (good catch SImon) but didn't introduce any randomization for hte reasons mentioned before (even with the change to not rely on consistent docIds in ItemizedFilter, we can't allow deletions before the wrapped searcher/reader because CheckHIts does it magic based on docIds. (hmm... i suppose the wrap functions could return some metadata about what offset the old ids have in the new search/reader and CheckHits could use that .... hmmm ... seems kludgy so i'm not going to worry about it) I think we're good to go here unless anyone has any objections
          Hide
          Mark Miller added a comment -

          One more minor entry from me:

          minor javadoc/spelling fixes
          removed unused imports

          Show
          Mark Miller added a comment - One more minor entry from me: minor javadoc/spelling fixes removed unused imports
          Hide
          Mark Miller added a comment -

          You going to resolve this Hoss? Or was that commit just a tease?

          Show
          Mark Miller added a comment - You going to resolve this Hoss? Or was that commit just a tease?
          Hide
          Hoss Man added a comment -

          Geezz... you expect me to commit and resolve ... i thought this open source stuff was all about volunteering and cooperation, how come i gotta do both ?!?!

          Show
          Hoss Man added a comment - Geezz... you expect me to commit and resolve ... i thought this open source stuff was all about volunteering and cooperation, how come i gotta do both ?!?!
          Hide
          Mark Miller added a comment -

          I would have been all over it But who knows if you have your reasons for not resolving yet.

          Thanks for all your work on this ! Brilliant issue that caught a lot of great stuff.

          Show
          Mark Miller added a comment - I would have been all over it But who knows if you have your reasons for not resolving yet. Thanks for all your work on this ! Brilliant issue that caught a lot of great stuff.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development