Lucene - Core
  1. Lucene - Core
  2. LUCENE-5274

Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6
    • Component/s: core/other
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I've been messing around with the FastVectorHighlighter and it looks like I can teach it to highlight matches on "child fields". Like this query:
      foo:scissors foo_exact:running
      would highlight foo like this:
      <em>running</em> with <em>scissors</em>

      Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.

      This would make queries that perform weighted matches against different analyzers much more convenient to highlight.

      I have working code and test cases but they are hacked into Elasticsearch. I'd love to Lucene-ify if you'll take them.

        Activity

        Hide
        Nik Everett added a comment -

        Patch implementing merging highlights on child fields.

        Show
        Nik Everett added a comment - Patch implementing merging highlights on child fields.
        Hide
        Nik Everett added a comment -

        I've uploaded a patch for this. I made the highlighter module depend on the query string parser and analyzer modules for testing. I probably could have gotten away without the query string parser but it made the test cases simpler to write. The analyzer module was required to analyze different fields with different analyzers which is kind of the point of this feature. My ant-foo is too weak for me to be sure I didn't set up some kind of horrible circular dependency that hasn't hit me.

        Show
        Nik Everett added a comment - I've uploaded a patch for this. I made the highlighter module depend on the query string parser and analyzer modules for testing. I probably could have gotten away without the query string parser but it made the test cases simpler to write. The analyzer module was required to analyze different fields with different analyzers which is kind of the point of this feature. My ant-foo is too weak for me to be sure I didn't set up some kind of horrible circular dependency that hasn't hit me.
        Hide
        Adrien Grand added a comment -

        Thanks Nick for working on this issue, I think this is nice to be able to highlight queries that target fields which have the same content but are analyzed differently!

        I made the highlighter module depend on the query string parser and analyzer modules for testing. I probably could have gotten away without the query string parser but it made the test cases simpler to write.

        We tend to avoid doing that in order to not have cross or circular dependencies between modules. This is not an issue at this stage of the patch but we should try replacing the analysis components you are using with MockAnalyzer at some point.

        I only had a quick look at the patch so far and I'm a bit unsure about childFields. Maybe it would be better API-wise to specify the stored field and the index fields separately? Or maybe to retrieve the index fields from the terms of the query? What do you think?

        Show
        Adrien Grand added a comment - Thanks Nick for working on this issue, I think this is nice to be able to highlight queries that target fields which have the same content but are analyzed differently! I made the highlighter module depend on the query string parser and analyzer modules for testing. I probably could have gotten away without the query string parser but it made the test cases simpler to write. We tend to avoid doing that in order to not have cross or circular dependencies between modules. This is not an issue at this stage of the patch but we should try replacing the analysis components you are using with MockAnalyzer at some point. I only had a quick look at the patch so far and I'm a bit unsure about childFields. Maybe it would be better API-wise to specify the stored field and the index fields separately? Or maybe to retrieve the index fields from the terms of the query? What do you think?
        Hide
        Nik Everett added a comment -

        > We tend to avoid doing that in order to not have cross or circular dependencies between modules. This is not an issue at this stage of the patch but we should try replacing the analysis components you are using with MockAnalyzer at some point.

        The PerFieldAnalyzerWrapper was the thing that pulled me there. I'd appreciate some tips on how to work around that. I'll have a look at removing the query parser dependency. I'm also using the EnglishAnalzyer but I'm just using that to have a third analyzer in the mix. I'll see about using MockAnalzyer for that too.

        > I only had a quick look at the patch so far and I'm a bit unsure about childFields. Maybe it would be better API-wise to specify the stored field and the index fields separately? Or maybe to retrieve the index fields from the terms of the query? What do you think?

        I don't like retrieving the indexed fields from the query - what if you don't want them all? how can you make sure that the ones that you take from the query really do share the same stored copy.

        As far as calling out the stored field and the indexed field separately - I think I like the idea. It'd let you load the source from a field that isn't actively being highlighted. I'll have a look at that.

        Show
        Nik Everett added a comment - > We tend to avoid doing that in order to not have cross or circular dependencies between modules. This is not an issue at this stage of the patch but we should try replacing the analysis components you are using with MockAnalyzer at some point. The PerFieldAnalyzerWrapper was the thing that pulled me there. I'd appreciate some tips on how to work around that. I'll have a look at removing the query parser dependency. I'm also using the EnglishAnalzyer but I'm just using that to have a third analyzer in the mix. I'll see about using MockAnalzyer for that too. > I only had a quick look at the patch so far and I'm a bit unsure about childFields. Maybe it would be better API-wise to specify the stored field and the index fields separately? Or maybe to retrieve the index fields from the terms of the query? What do you think? I don't like retrieving the indexed fields from the query - what if you don't want them all? how can you make sure that the ones that you take from the query really do share the same stored copy. As far as calling out the stored field and the indexed field separately - I think I like the idea. It'd let you load the source from a field that isn't actively being highlighted. I'll have a look at that.
        Hide
        Nik Everett added a comment -

        Reworked to remove dependency on query parser and most of the analyzer dependency and to fix errors with phrases. It'll need to lose the rest of the analyzer dependency and have more test cases in addition to any other concerns raised in the review.

        Show
        Nik Everett added a comment - Reworked to remove dependency on query parser and most of the analyzer dependency and to fix errors with phrases. It'll need to lose the rest of the analyzer dependency and have more test cases in addition to any other concerns raised in the review.
        Hide
        Nik Everett added a comment -

        New version of the patch. This one works a lot better with phrases and even works on fields that have the same source but different tokenizers.

        It still makes highlighting depend on the analysis module to pick up PerFieldAnalyzerWrapper.

        I think all the new code this adds to FieldPhraseList deserves a unit test on its own but I'm not in the frame of mind to write one at the moment so I'll have to come back to it later.

        Show
        Nik Everett added a comment - New version of the patch. This one works a lot better with phrases and even works on fields that have the same source but different tokenizers. It still makes highlighting depend on the analysis module to pick up PerFieldAnalyzerWrapper. I think all the new code this adds to FieldPhraseList deserves a unit test on its own but I'm not in the frame of mind to write one at the moment so I'll have to come back to it later.
        Hide
        Robert Muir added a comment -

        Why would a highlighter improvement require mocktokenizer changes?

        Show
        Robert Muir added a comment - Why would a highlighter improvement require mocktokenizer changes?
        Hide
        Nik Everett added a comment -

        Hey, forgot to mention that. MockTokenizer seems to throw away the character after the end of each token even if that character is the valid start to the next token. This comes up because I wanted to tokenize strings in a simplistic way to test that the highlighter can handle different tokenizers and it just wasn't working right. So I "fixed" MockTokenizer but I did it in a pretty brutal way. I'm happy to move the change to another bug and improve it but testing the highlighter change without it is a bit painful.

        Show
        Nik Everett added a comment - Hey, forgot to mention that. MockTokenizer seems to throw away the character after the end of each token even if that character is the valid start to the next token. This comes up because I wanted to tokenize strings in a simplistic way to test that the highlighter can handle different tokenizers and it just wasn't working right. So I "fixed" MockTokenizer but I did it in a pretty brutal way. I'm happy to move the change to another bug and improve it but testing the highlighter change without it is a bit painful.
        Hide
        Robert Muir added a comment -

        if you suspect there is a bug in mocktokenizer, please open a separate issue for that. mocktokenizer is used by like, thousands of tests

        Show
        Robert Muir added a comment - if you suspect there is a bug in mocktokenizer, please open a separate issue for that. mocktokenizer is used by like, thousands of tests
        Hide
        Nik Everett added a comment -

        Filed LUCENE-5278.

        Show
        Nik Everett added a comment - Filed LUCENE-5278 .
        Hide
        Robert Muir added a comment -

        Thanks Nik: I can help with that one!

        Another question: about the MergedIterator

        I can see the possible use case here, but I think it deserves some discussion first (versus just making it public).
        This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way:

        • what if you have a synonym of "dog" sitting on top of "cat" with the same boost factor... its a duplicate according to that compareTo, but the text is different.
        • what if the synonym is just "dog" with posinc=0 stacked ontop of itself (which is totally valid to do)...

        Perhaps highlighting can make use of it, but its unclear to me that its really following the contract. Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous.

        If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on?

        Show
        Robert Muir added a comment - Thanks Nik: I can help with that one! Another question: about the MergedIterator I can see the possible use case here, but I think it deserves some discussion first (versus just making it public). This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way: what if you have a synonym of "dog" sitting on top of "cat" with the same boost factor... its a duplicate according to that compareTo, but the text is different. what if the synonym is just "dog" with posinc=0 stacked ontop of itself (which is totally valid to do)... Perhaps highlighting can make use of it, but its unclear to me that its really following the contract. Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous. If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on?
        Hide
        Nik Everett added a comment -

        I can see the possible use case here, but I think it deserves some discussion first (versus just making it public).

        Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava?

        This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way

        Yikes! I didn't catch that but now that you point it out it is right there in the docs and I should have. WeightedPhraseInfo doesn't behave that way and

        Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous.

        I agree on the inconsistent with inconsistent with equals. I can either fix that or use a Comparator for sorting both WeightedPhraseInfo and Toffs. That'd require a MergeSorter that can take one but

        If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on?

        Makes sense to me.

        Show
        Nik Everett added a comment - I can see the possible use case here, but I think it deserves some discussion first (versus just making it public). Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava? This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way Yikes! I didn't catch that but now that you point it out it is right there in the docs and I should have. WeightedPhraseInfo doesn't behave that way and Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous. I agree on the inconsistent with inconsistent with equals. I can either fix that or use a Comparator for sorting both WeightedPhraseInfo and Toffs. That'd require a MergeSorter that can take one but If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on? Makes sense to me.
        Hide
        Robert Muir added a comment -

        Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava?

        There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless).

        We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions.

        Show
        Robert Muir added a comment - Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava? There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless). We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions.
        Hide
        Nik Everett added a comment -

        quote
        There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless).

        We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions.
        quote
        I figured that was the reasoning and I don't intend to argue with it. In this case it would provide a method to merge sorted iterators just like MergedIterator only without the caveats around duplication but I'm happy to work around it. Guava certainly wouldn't fix my forgetting equals and hashcode.

        Show
        Nik Everett added a comment - quote There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless). We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions. quote I figured that was the reasoning and I don't intend to argue with it. In this case it would provide a method to merge sorted iterators just like MergedIterator only without the caveats around duplication but I'm happy to work around it. Guava certainly wouldn't fix my forgetting equals and hashcode.
        Hide
        Robert Muir added a comment -

        Yeah I guess for me, its not a caveat at all, but a feature

        We need to iterate sorted-union for stuff in the index like terms and fields, so they appear as if they exist only once.
        The guava one isn't doing a "union" operation but just simply maintaining compareTo() order...

        Show
        Robert Muir added a comment - Yeah I guess for me, its not a caveat at all, but a feature We need to iterate sorted-union for stuff in the index like terms and fields, so they appear as if they exist only once. The guava one isn't doing a "union" operation but just simply maintaining compareTo() order...
        Hide
        Nik Everett added a comment -

        I'm having a look at what I can do to pull MergedIterator into the util package and give it nice unit tests. Almost done with that and I should be able to spin another version of this patch. I'm not exactly sure of a good way to test the synonym stuff in FastVectorHighlighterTest - I don't see a mock Synonym filter.

        Show
        Nik Everett added a comment - I'm having a look at what I can do to pull MergedIterator into the util package and give it nice unit tests. Almost done with that and I should be able to spin another version of this patch. I'm not exactly sure of a good way to test the synonym stuff in FastVectorHighlighterTest - I don't see a mock Synonym filter.
        Hide
        Adrien Grand added a comment -

        I'm not exactly sure of a good way to test the synonym stuff in FastVectorHighlighterTest

        I usually do that by building the token streams by hand, it is quite easy, see CannedTokenStream and Token for example.

        Show
        Adrien Grand added a comment - I'm not exactly sure of a good way to test the synonym stuff in FastVectorHighlighterTest I usually do that by building the token streams by hand, it is quite easy, see CannedTokenStream and Token for example.
        Hide
        Nik Everett added a comment -

        Not done yet but progress:
        1. Move MergedIterator to util.
        2. Add a mode to it to not remove duplicates (one extra branch per call to next).
        3. Add a unit test for MergedIterator.
        4. Make FieldTermStack.TermInfo, FieldPhraseList.WeighterPhraseInfo, FieldPhraseList.WeightedPhraseInfo.Toffs consistent under equals, hashCode, and compareTo. I don't think any of them would make good hash keys but I fixed up hashCode because I fixed up equals.
        5. Unit tests for point 4.
        7. Use the non-duplicate removing mode of MergedIterator in FieldPhraseList's merge methods.
        6. More tests in FastVectorHighlighterTest - mostly around exact equal matches and how they effect segment sorting.

        At this point this is left:
        1. Unit tests for equal matches in the same FieldPhraseList.
        2. Poke around with corner cases during merges. Test them in FastVectorHighlighterTest if they reflect mockable real world cases. Expand FieldPhraseListTest if they don't.
        4. Remove highlighter dependency on analyzer module. Would it make sense to move PerFieldAnalyzerWrapper into core? Something else?
        3. Anything else from review.

        Show
        Nik Everett added a comment - Not done yet but progress: 1. Move MergedIterator to util. 2. Add a mode to it to not remove duplicates (one extra branch per call to next). 3. Add a unit test for MergedIterator. 4. Make FieldTermStack.TermInfo, FieldPhraseList.WeighterPhraseInfo, FieldPhraseList.WeightedPhraseInfo.Toffs consistent under equals, hashCode, and compareTo. I don't think any of them would make good hash keys but I fixed up hashCode because I fixed up equals. 5. Unit tests for point 4. 7. Use the non-duplicate removing mode of MergedIterator in FieldPhraseList's merge methods. 6. More tests in FastVectorHighlighterTest - mostly around exact equal matches and how they effect segment sorting. At this point this is left: 1. Unit tests for equal matches in the same FieldPhraseList. 2. Poke around with corner cases during merges. Test them in FastVectorHighlighterTest if they reflect mockable real world cases. Expand FieldPhraseListTest if they don't. 4. Remove highlighter dependency on analyzer module. Would it make sense to move PerFieldAnalyzerWrapper into core? Something else? 3. Anything else from review.
        Hide
        Nik Everett added a comment -

        Removed analyzer dependency and added tests covering synonyms.

        Show
        Nik Everett added a comment - Removed analyzer dependency and added tests covering synonyms.
        Hide
        Adrien Grand added a comment -

        I had a problem when applying the patch (it removed MergedIterator.java) but overall the patch looks good to me. There are some things that might be worth checking before committing:

        • even though these are tests, the definition for the 'field_der_red' analyzer in matchedFieldsTestCase looks weird since it declares being an analyzer wrapper but doesn't reuse the underlying analyzer, maybe it should use an anonymous Field impl that would override the tokenStream() method to return a CannedTokenStream instead of defining an analyzer?
        • the documentation and some method names still refer to child fields, I think that 'matched' fields, that you used in the FVH javadoc, is a better description?
        • I think the matchedFields argument of getBestFragments should be a Set<String> instead of an array to ensure uniqueness
        • maybe the assert fieldNames.length > 0; in FVH.getFieldFragList should be replaced by a hard check since it is checking user-provided data?
        • You added WeightedPhraseInfo.merge which mutates in place although the class seems to have been designed to be immutable (it computes the value of 'text' in the constructor based on termInfos, and you update termInfos in merge, so this seems to invalidate the value of 'text'?)
        Show
        Adrien Grand added a comment - I had a problem when applying the patch (it removed MergedIterator.java) but overall the patch looks good to me. There are some things that might be worth checking before committing: even though these are tests, the definition for the 'field_der_red' analyzer in matchedFieldsTestCase looks weird since it declares being an analyzer wrapper but doesn't reuse the underlying analyzer, maybe it should use an anonymous Field impl that would override the tokenStream() method to return a CannedTokenStream instead of defining an analyzer? the documentation and some method names still refer to child fields, I think that 'matched' fields, that you used in the FVH javadoc, is a better description? I think the matchedFields argument of getBestFragments should be a Set<String> instead of an array to ensure uniqueness maybe the assert fieldNames.length > 0; in FVH.getFieldFragList should be replaced by a hard check since it is checking user-provided data? You added WeightedPhraseInfo.merge which mutates in place although the class seems to have been designed to be immutable (it computes the value of 'text' in the constructor based on termInfos, and you update termInfos in merge, so this seems to invalidate the value of 'text'?)
        Hide
        Nik Everett added a comment -

        (it removed MergedIterator.java)

        It was supposed to move it to the util package. I'll figure out what happened there.

        I agree with the other points but it is worth discussing the last one. The others I'll just make the changes you mention.

        I intentionally didn't update text in WeightedPhraseInfo.merge because it is documented as being for debugging so it didn't seem worth the cost. Would it make sense to remove the member entirely and generate it from stored terms when needed?

        It also doesn't update seqnum mostly because I really don't know the right way to update it.

        As for WeightedPhraseInfo's immutability - I didn't see any final members so setting up the state in the constructor and not having setters just looked more like it wanted to encapsulate logic rather than immutability. I'll swap the merge method with a merging constructor.

        Show
        Nik Everett added a comment - (it removed MergedIterator.java) It was supposed to move it to the util package. I'll figure out what happened there. I agree with the other points but it is worth discussing the last one. The others I'll just make the changes you mention. I intentionally didn't update text in WeightedPhraseInfo.merge because it is documented as being for debugging so it didn't seem worth the cost. Would it make sense to remove the member entirely and generate it from stored terms when needed? It also doesn't update seqnum mostly because I really don't know the right way to update it. As for WeightedPhraseInfo's immutability - I didn't see any final members so setting up the state in the constructor and not having setters just looked more like it wanted to encapsulate logic rather than immutability. I'll swap the merge method with a merging constructor.
        Hide
        Adrien Grand added a comment -

        As for WeightedPhraseInfo's immutability - I didn't see any final members so setting up the state in the constructor and not having setters just looked more like it wanted to encapsulate logic rather than immutability.

        I don't know if it was initially designed to be immutable, it's just the fact that 'text' was computed in the constructor and that this class had no method that mutates data that made me think so. But since immutable objects are easier to work with, maybe we should make this class immutable.

        I'll swap the merge method with a merging constructor.

        OK

        Show
        Adrien Grand added a comment - As for WeightedPhraseInfo's immutability - I didn't see any final members so setting up the state in the constructor and not having setters just looked more like it wanted to encapsulate logic rather than immutability. I don't know if it was initially designed to be immutable, it's just the fact that 'text' was computed in the constructor and that this class had no method that mutates data that made me think so. But since immutable objects are easier to work with, maybe we should make this class immutable. I'll swap the merge method with a merging constructor. OK
        Hide
        Nik Everett added a comment -

        Fix all issues exception the text on WeightedPhraseInfo. If we're ok with building it on the fly then I'll get to that in the morning.

        I can't get the patch to apply cleanly - something to do with moving a file and then changing its contents. The closest I can come is:
        svn mv lucene/core/src/java/org/apache/lucene/index/MergedIterator.java lucene/core/src/java/org/apache/lucene/util/
        patch -f -p0 < ~/LUCENE-5274.patch
        svn add lucene/core/src/test/org/apache/lucene/util/TestMergedIterator.java

        I'm sure there is a better way to do this. If you get the chance please let me know.

        Show
        Nik Everett added a comment - Fix all issues exception the text on WeightedPhraseInfo. If we're ok with building it on the fly then I'll get to that in the morning. I can't get the patch to apply cleanly - something to do with moving a file and then changing its contents. The closest I can come is: svn mv lucene/core/src/java/org/apache/lucene/index/MergedIterator.java lucene/core/src/java/org/apache/lucene/util/ patch -f -p0 < ~/ LUCENE-5274 .patch svn add lucene/core/src/test/org/apache/lucene/util/TestMergedIterator.java I'm sure there is a better way to do this. If you get the chance please let me know.
        Hide
        Nik Everett added a comment -

        Attached a version of the patch that applies cleanly but doesn't clearly show the changes to MergedIterator. I built it by svn rm and svn add rather than svn mv + edit.

        Show
        Nik Everett added a comment - Attached a version of the patch that applies cleanly but doesn't clearly show the changes to MergedIterator. I built it by svn rm and svn add rather than svn mv + edit.
        Hide
        Nik Everett added a comment -

        Finally switch text to generated on the fly. No other changes. Patch should apply cleanly but like the last one doesn't clearly show what I changed in MergedIterator.

        Show
        Nik Everett added a comment - Finally switch text to generated on the fly. No other changes. Patch should apply cleanly but like the last one doesn't clearly show what I changed in MergedIterator.
        Hide
        Adrien Grand added a comment -

        Thanks Nik, the patch looks good to me, and all tests passed on my computer. Could we just make TestMergedIterator a bit faster? Maybe we could decrease a bit 'numLists' or 'end - start'?

        I will commit the patch next week when I'm back from traveling unless someone commits it before me.

        Show
        Adrien Grand added a comment - Thanks Nik, the patch looks good to me, and all tests passed on my computer. Could we just make TestMergedIterator a bit faster? Maybe we could decrease a bit 'numLists' or 'end - start'? I will commit the patch next week when I'm back from traveling unless someone commits it before me.
        Hide
        Robert Muir added a comment -

        Can the style of if '( removeDuplicates )' please not be added to the lucene core?

        Thanks

        Show
        Robert Muir added a comment - Can the style of if '( removeDuplicates )' please not be added to the lucene core? Thanks
        Hide
        Nik Everett added a comment -

        Change codestyle on MergedIterator changes and TestMergedIterator to match the style in the rest of core. The FVH changes still use the wide style prevalent in the FVH code.
        Also, sort fewer numbers in TestMergedIterator to make it faster. The only reason I was sorting so many the first time around was to get a good sense of what I was doing to the speed by adding the additional conditional.

        Show
        Nik Everett added a comment - Change codestyle on MergedIterator changes and TestMergedIterator to match the style in the rest of core. The FVH changes still use the wide style prevalent in the FVH code. Also, sort fewer numbers in TestMergedIterator to make it faster. The only reason I was sorting so many the first time around was to get a good sense of what I was doing to the speed by adding the additional conditional.
        Hide
        ASF subversion and git services added a comment -

        Commit 1534281 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1534281 ]

        LUCENE-5274: FastVectorHighlighter now supports highlighting against several indexed fields

        Show
        ASF subversion and git services added a comment - Commit 1534281 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1534281 ] LUCENE-5274 : FastVectorHighlighter now supports highlighting against several indexed fields
        Hide
        ASF subversion and git services added a comment -

        Commit 1534288 from Adrien Grand in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1534288 ]

        LUCENE-5274: FastVectorHighlighter now supports highlighting against several indexed fields

        Show
        ASF subversion and git services added a comment - Commit 1534288 from Adrien Grand in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1534288 ] LUCENE-5274 : FastVectorHighlighter now supports highlighting against several indexed fields
        Hide
        Adrien Grand added a comment -

        Committed, thanks Nik!

        Show
        Adrien Grand added a comment - Committed, thanks Nik!

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Nik Everett
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development