Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Schema and Analysis, search
    • Labels:
      None

      Description

      Solr should use NGramPhraseQuery when searching with default slop on n-gram field.

      1. SOLR-3055.patch
        5 kB
        Koji Sekiguchi

        Activity

        Hide
        Koji Sekiguchi added a comment -

        How about introducing something like GramSizeAttribute?

        I attached just an idea and draft level patch.

        Show
        Koji Sekiguchi added a comment - How about introducing something like GramSizeAttribute? I attached just an idea and draft level patch.
        Hide
        Robert Muir added a comment -

        Hi Koji: I think as far as attribute+QP, it might not be the best way to go.

        For example, another way (and customization of phrase query) is on SOLR-2660.
        In that patch i added factory methods to QueryParser so you can override this:
        then hooks to solr's fieldtype.

        But with the attribute approach, what happens if I omit positions AND use n-grams?
        This is a totally reasonable thing to do, since positions are redundantly encoded
        in the n-gram term text, it makes sense i might not index any positions at all
        and approximate my phrase queries with boolean AND

        I think subclassing is a better approach: because otherwise how do we
        determine which would run first in the case of multiple conflicting attributes?

        In this case then the consumer (e.g. Solr) is forced to decide and its more consistent
        with the way other queries are generated: getXXXQuery() etc.

        Show
        Robert Muir added a comment - Hi Koji: I think as far as attribute+QP, it might not be the best way to go. For example, another way (and customization of phrase query) is on SOLR-2660 . In that patch i added factory methods to QueryParser so you can override this: then hooks to solr's fieldtype. But with the attribute approach, what happens if I omit positions AND use n-grams? This is a totally reasonable thing to do, since positions are redundantly encoded in the n-gram term text, it makes sense i might not index any positions at all and approximate my phrase queries with boolean AND I think subclassing is a better approach: because otherwise how do we determine which would run first in the case of multiple conflicting attributes? In this case then the consumer (e.g. Solr) is forced to decide and its more consistent with the way other queries are generated: getXXXQuery() etc.
        Hide
        Robert Muir added a comment -

        But, an advantage to the approach of this patch, is that it would work when not all text is n-grammed right?
        E.g. the case of CJKAnalyzer, where english does not form ngrams. I think this is important.

        Maybe there is some way to have the best of both...

        Show
        Robert Muir added a comment - But, an advantage to the approach of this patch, is that it would work when not all text is n-grammed right? E.g. the case of CJKAnalyzer, where english does not form ngrams. I think this is important. Maybe there is some way to have the best of both...

          People

          • Assignee:
            Unassigned
            Reporter:
            Koji Sekiguchi
          • Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development