Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Mutable queries are an issue for automatic filter caching since modifying a query after it has been put into the cache will corrupt the cache. We should make all queries immutable (up to the boost) to avoid this issue.

      1. LUCENE-6531.patch
        147 kB
        Adrien Grand
      2. LUCENE-6531.patch
        165 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Here is a patch. PhraseQuery has a main constructor: PhraseQuery(int slop, String field, List<TermAndPosition> termsAndPositions) and some sugar constructors such as PhraseQuery(String field, String... terms) that will search for terms at consecutive positions in field with a slop of 0.

        Show
        Adrien Grand added a comment - Here is a patch. PhraseQuery has a main constructor: PhraseQuery(int slop, String field, List<TermAndPosition> termsAndPositions) and some sugar constructors such as PhraseQuery(String field, String... terms) that will search for terms at consecutive positions in field with a slop of 0.
        Hide
        Robert Muir added a comment -

        I think TermAndPosition is very awkward. Can we somehow avoid a new class here?

        The only advantage it has, is that you cant screw up by using a different field, but we shouldn't fashion APIs around error/abuse cases, and the disadvantages here (new class, and now position is mandatory but its confusing what it should be, should it start at 0 or 1, etc etc).

        This stuff adds up to make it way harder than today, where its a POJO with simple add() method and position is optional.

        PhraseQuery q = new PhraseQuery();
        q.add(new Term(...));
        q.add(new Term(...));
        
        Show
        Robert Muir added a comment - I think TermAndPosition is very awkward. Can we somehow avoid a new class here? The only advantage it has, is that you cant screw up by using a different field, but we shouldn't fashion APIs around error/abuse cases, and the disadvantages here (new class, and now position is mandatory but its confusing what it should be, should it start at 0 or 1, etc etc). This stuff adds up to make it way harder than today, where its a POJO with simple add() method and position is optional. PhraseQuery q = new PhraseQuery(); q.add( new Term(...)); q.add( new Term(...));
        Hide
        Adrien Grand added a comment -

        I think TermAndPosition is very awkward. Can we somehow avoid a new class here?

        If we don't want to have a wrapper around a term and a position, then the only options I see would be to either require users to provide two arrays/lists, one for terms and one for positions, or to use a builder in order to hide this wrapper: the builder could have a add(BytesRef term, int position) method like PhraseQuery does today. I don't really like APIs that expect users to provide parallel arrays, and even though I don't mind builders in general I think some other committers strongly dislike them so I thought the Term/position wrapper was a good option. It's also similar to how BooleanClause wraps a Query and an Occur?

        should it start at 0 or 1

        Both would work since positions are relative.

        Show
        Adrien Grand added a comment - I think TermAndPosition is very awkward. Can we somehow avoid a new class here? If we don't want to have a wrapper around a term and a position, then the only options I see would be to either require users to provide two arrays/lists, one for terms and one for positions, or to use a builder in order to hide this wrapper: the builder could have a add(BytesRef term, int position) method like PhraseQuery does today. I don't really like APIs that expect users to provide parallel arrays, and even though I don't mind builders in general I think some other committers strongly dislike them so I thought the Term/position wrapper was a good option. It's also similar to how BooleanClause wraps a Query and an Occur? should it start at 0 or 1 Both would work since positions are relative.
        Hide
        Robert Muir added a comment -

        Right, I guess I feel there are even more options in addition to what is here. For example positions could be relative rather than absolute: in other words you provide the gap.

        As far as the builder solution, I think its an interesting option here, because we could have a coherent way of solving this across other queries. As far as "strongly dislike", my major problems are (as witnessed by codebases using them):

        • absolute nonsense like "AbstractBuilder"s going with them
        • impossible to read code due to chaining/indentation
        • loss of standard getXXX() setXXX() notation

        In my opinion, none of these things are really "related" to builders. Builders are an OK tool and if you look in this codebase you will see that they are used. Its the "Builder pattern" which drags along all that above shit that is truly horrible.

        Show
        Robert Muir added a comment - Right, I guess I feel there are even more options in addition to what is here. For example positions could be relative rather than absolute: in other words you provide the gap. As far as the builder solution, I think its an interesting option here, because we could have a coherent way of solving this across other queries. As far as "strongly dislike", my major problems are (as witnessed by codebases using them): absolute nonsense like "AbstractBuilder"s going with them impossible to read code due to chaining/indentation loss of standard getXXX() setXXX() notation In my opinion, none of these things are really "related" to builders. Builders are an OK tool and if you look in this codebase you will see that they are used. Its the "Builder pattern" which drags along all that above shit that is truly horrible.
        Hide
        Adrien Grand added a comment -

        Thanks for the suggestions. So would you be fine with a new PhraseQuery.Builder class that would have the following skeleton?

        PhraseQuery.Builder {
          void add(BytesRef term, int positionIncrement); // positionIncrement is relative to the previously added term
          void setSlop(int slop);
          PhraseQuery build();
        }
        
        Show
        Adrien Grand added a comment - Thanks for the suggestions. So would you be fine with a new PhraseQuery.Builder class that would have the following skeleton? PhraseQuery.Builder { void add(BytesRef term, int positionIncrement); // positionIncrement is relative to the previously added term void setSlop( int slop); PhraseQuery build(); }
        Hide
        Robert Muir added a comment -

        As mentioned above I really think it should have add() without any position stuff to go along with what is there in PhraseQuery today. In general can't we just give it the exact API of phrasequery today? (defer this String field stuff, again i dont think its very important and we have the advantage of making it easier for people to move to the new api)

        Show
        Robert Muir added a comment - As mentioned above I really think it should have add() without any position stuff to go along with what is there in PhraseQuery today. In general can't we just give it the exact API of phrasequery today? (defer this String field stuff, again i dont think its very important and we have the advantage of making it easier for people to move to the new api)
        Hide
        Adrien Grand added a comment -

        OK, I'll work on a patch where the builder has the same API as PhraseQuery today then if nobody has objections.

        Show
        Adrien Grand added a comment - OK, I'll work on a patch where the builder has the same API as PhraseQuery today then if nobody has objections.
        Hide
        Michael McCandless added a comment -

        +1 for a sane builder API.

        Show
        Michael McCandless added a comment - +1 for a sane builder API.
        Hide
        Adrien Grand added a comment -

        Here is a new patch that adds PhraseQuery.Builder with the same API as today's PhraseQuery.

        I kept the "sugar" constructors on PhraseQuery that make it easy to create simple phrase queries, like new PhraseQuery("body_field", "quick", "fox").

        Show
        Adrien Grand added a comment - Here is a new patch that adds PhraseQuery.Builder with the same API as today's PhraseQuery. I kept the "sugar" constructors on PhraseQuery that make it easy to create simple phrase queries, like new PhraseQuery("body_field", "quick", "fox") .
        Hide
        Adrien Grand added a comment -

        I'll commit shortly if there are no objections.

        Show
        Adrien Grand added a comment - I'll commit shortly if there are no objections.
        Hide
        ASF subversion and git services added a comment -

        Commit 1685754 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1685754 ]

        LUCENE-6531: Make PhraseQuery immutable.

        Show
        ASF subversion and git services added a comment - Commit 1685754 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1685754 ] LUCENE-6531 : Make PhraseQuery immutable.
        Hide
        Cao Manh Dat added a comment -

        Adrien Grand : It look like we can modify PhraseQuery api a little bit. For a long time we set PhraseQuery.field base on the field of first term added to PhraseQuery, and it throw exception when terms added to PhraseQuery not have same field. We even allow null for field and in rewrite method we return and empty BooleanQuery with boost (what!!). Since we have a builder today, why not modify it little bit to make PhraseQuery clearer?

        • Change PhraseQuery.Builder() -> PhraseQuery.Builder(String field)
        • Change PhraseQuery.Builder.add(Term term) -> PhraseQuery.Builder.add(String termText)
        • We then deprecated all public constructor of PhraseQuery

        After that PhraseQuery we be much clearer! I will happy to submit a patch.

        Show
        Cao Manh Dat added a comment - Adrien Grand : It look like we can modify PhraseQuery api a little bit. For a long time we set PhraseQuery.field base on the field of first term added to PhraseQuery, and it throw exception when terms added to PhraseQuery not have same field. We even allow null for field and in rewrite method we return and empty BooleanQuery with boost (what!!). Since we have a builder today, why not modify it little bit to make PhraseQuery clearer? Change PhraseQuery.Builder() -> PhraseQuery.Builder(String field) Change PhraseQuery.Builder.add(Term term) -> PhraseQuery.Builder.add(String termText) We then deprecated all public constructor of PhraseQuery After that PhraseQuery we be much clearer! I will happy to submit a patch.
        Hide
        Adrien Grand added a comment -

        I agree with you it would make the API nicer, it has not been done in this issue in order to focus on making PhraseQuery immutable while changing the API as little as possible (the only change is that setters moved the the Builder class). We can open another issue if we want to improve the API, it's indeed a bit weird to expect consumers to provide terms that all have the same field.

        Show
        Adrien Grand added a comment - I agree with you it would make the API nicer, it has not been done in this issue in order to focus on making PhraseQuery immutable while changing the API as little as possible (the only change is that setters moved the the Builder class). We can open another issue if we want to improve the API, it's indeed a bit weird to expect consumers to provide terms that all have the same field.
        Hide
        Cao Manh Dat added a comment -

        I agree that it will be nice if Lucene is backward compatible. I will open an issue where we can discuss more about PhraseQuery modification.

        Show
        Cao Manh Dat added a comment - I agree that it will be nice if Lucene is backward compatible. I will open an issue where we can discuss more about PhraseQuery modification.
        Hide
        ASF subversion and git services added a comment -

        Commit 1693059 from Adrien Grand in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1693059 ]

        LUCENE-6531: Make PhraseQuery immutable.

        Show
        ASF subversion and git services added a comment - Commit 1693059 from Adrien Grand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1693059 ] LUCENE-6531 : Make PhraseQuery immutable.
        Hide
        ASF subversion and git services added a comment -

        Commit 1693060 from Adrien Grand in branch 'dev/trunk'
        [ https://svn.apache.org/r1693060 ]

        LUCENE-6531: backported to 5.3.

        Show
        ASF subversion and git services added a comment - Commit 1693060 from Adrien Grand in branch 'dev/trunk' [ https://svn.apache.org/r1693060 ] LUCENE-6531 : backported to 5.3.
        Hide
        Terry Smith added a comment -

        Adrien Grand The PhraseQuery.Builder setter methods are all void, where as the ones for BooleanQuery and BlendedTermQuery return the Builder itself.

        Can the set/add methods on PhraseQuery.Builder return this to make the various Query builders consistent with each other?

        Show
        Terry Smith added a comment - Adrien Grand The PhraseQuery.Builder setter methods are all void, where as the ones for BooleanQuery and BlendedTermQuery return the Builder itself. Can the set/add methods on PhraseQuery.Builder return this to make the various Query builders consistent with each other?
        Hide
        Adrien Grand added a comment -

        Thanks Terry for catching this. This issue was discussed for BooleanQuery after PhraseQuery had already been checked in, and I didn't think of changing PhraseQuery.Builder methods to return this afterwards. I'll do it now.

        Show
        Adrien Grand added a comment - Thanks Terry for catching this. This issue was discussed for BooleanQuery after PhraseQuery had already been checked in, and I didn't think of changing PhraseQuery.Builder methods to return this afterwards. I'll do it now.
        Hide
        Terry Smith added a comment -

        Awesome, you rock!

        Show
        Terry Smith added a comment - Awesome, you rock!
        Hide
        Adrien Grand added a comment -

        I fixed it through LUCENE-6707.

        Show
        Adrien Grand added a comment - I fixed it through LUCENE-6707 .
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development