Solr
  1. Solr
  2. SOLR-2113

Create TermsQParser that deals with toInternal() conversion of external terms

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: search
    • Labels:
      None

      Description

      For converting facet.field response constraints into filter queries, it would be helpful to have a QParser that generated a TermQuery using the toInternal() converted result of the raw "q" param

      1. SOLR-2113.patch
        5 kB
        Yonik Seeley

        Activity

        Hide
        Hoss Man added a comment -

        Currently, the best choices for dealing with this type of situation (generating filter queries from facet selectsion) are either "RawQParser" or "FieldQParser"

        • raw - works well for strings and text, but because it expects the input to be the completley raw term value, it doesn't work for FieldTypes that use encoding in their toInternal/toExternal methods (ie: SortableIntField, TrieIntField, etc...)
        • field - handles the toInternal problem (by delegating to FieldType.getFieldQuery()) but in the case of TextField this results in analysis being used on the input first – if the analyzer configured isn't idempotent (love learning new words from yonik) this can also cause a problem (this situation may not be common, but it can be easy to get into w/o knowing it depending on how charfilters / stemmers are used)

        Hence the desire for a new QParser ("term" seems like an appropriate name) that can be used like the "field" or "raw" QParsers (take a field name as a localParam) and is essentially implemented as...

        return new TermQuery(new Term(fieldName, getSchema().getField(fieldName).getType().toInternal(q)))
        
        Show
        Hoss Man added a comment - Currently, the best choices for dealing with this type of situation (generating filter queries from facet selectsion) are either "RawQParser" or "FieldQParser" raw - works well for strings and text, but because it expects the input to be the completley raw term value, it doesn't work for FieldTypes that use encoding in their toInternal/toExternal methods (ie: SortableIntField, TrieIntField, etc...) field - handles the toInternal problem (by delegating to FieldType.getFieldQuery()) but in the case of TextField this results in analysis being used on the input first – if the analyzer configured isn't idempotent (love learning new words from yonik) this can also cause a problem (this situation may not be common, but it can be easy to get into w/o knowing it depending on how charfilters / stemmers are used) Hence the desire for a new QParser ("term" seems like an appropriate name) that can be used like the "field" or "raw" QParsers (take a field name as a localParam) and is essentially implemented as... return new TermQuery( new Term(fieldName, getSchema().getField(fieldName).getType().toInternal(q)))
        Hide
        Yonik Seeley added a comment -

        Here's a patch w/ tests and updates to javadoc that implements the proposal and registers it as "term".

        example:

        {!term f=id}

        10

        Show
        Yonik Seeley added a comment - Here's a patch w/ tests and updates to javadoc that implements the proposal and registers it as "term". example: {!term f=id} 10
        Hide
        David Smiley added a comment -

        I was looking at Hoss's description, and I don't understand why an Analyzer would not be idempotent. I thought they all are; if not it's a bug. No?

        It's a shame this very isolated feature never got back-ported to 3.1.

        Show
        David Smiley added a comment - I was looking at Hoss's description, and I don't understand why an Analyzer would not be idempotent. I thought they all are; if not it's a bug. No? It's a shame this very isolated feature never got back-ported to 3.1.
        Hide
        Robert Muir added a comment -

        most analyzers are not idempotent.

        this wouldn't be a valuable property to have (useless for a search engine).
        its also not practical nor worth the trouble.

        one thing to also keep in mind is that analyzers these days take Reader and ultimately return byte[], for example at the extreme a collation analyzer returns a binary sort key as a term... this isn't reversible back to a String at all in any way.

        Show
        Robert Muir added a comment - most analyzers are not idempotent. this wouldn't be a valuable property to have (useless for a search engine). its also not practical nor worth the trouble. one thing to also keep in mind is that analyzers these days take Reader and ultimately return byte[], for example at the extreme a collation analyzer returns a binary sort key as a term... this isn't reversible back to a String at all in any way.
        Hide
        David Smiley added a comment -

        Sorry, I'm still confused. Can you please give a simple example as to how an analyzer would give different results on a subsequent invocation for the same input?

        Show
        David Smiley added a comment - Sorry, I'm still confused. Can you please give a simple example as to how an analyzer would give different results on a subsequent invocation for the same input?
        Hide
        Robert Muir added a comment -

        the easiest example is a synonyms filter: analyze(analyze) will be different than analyze

        Show
        Robert Muir added a comment - the easiest example is a synonyms filter: analyze(analyze ) will be different than analyze
        Hide
        Yonik Seeley added a comment -

        porter stemming is not idempotent.

        stem(hellosing) -> hellos
        stem(stem(hellosing)) -> hello

        Show
        Yonik Seeley added a comment - porter stemming is not idempotent. stem(hellosing) -> hellos stem(stem(hellosing)) -> hello
        Hide
        David Smiley added a comment -

        It appears what I understood to be "idempotent" is different then the meaning here. I looked this word up in wikipedia and it appears in the context of computer science that it has two separate meanings. One meaning has more to do with side-effects, which is the meaning I've always attached to the word. It comes up a lot when talking about thread-safe code. The other meaning associated with functional programming is the meaning intended by Yonik & Rob here – a meaning I don't think I would ever put to use. It's unfortunate that this word is ambiguous... since it's very useful to use it to say that a method on a class always has the same result for the same input, without saying you can give the output back to the input again and also get the same result.

        Show
        David Smiley added a comment - It appears what I understood to be "idempotent" is different then the meaning here. I looked this word up in wikipedia and it appears in the context of computer science that it has two separate meanings. One meaning has more to do with side-effects, which is the meaning I've always attached to the word. It comes up a lot when talking about thread-safe code. The other meaning associated with functional programming is the meaning intended by Yonik & Rob here – a meaning I don't think I would ever put to use. It's unfortunate that this word is ambiguous... since it's very useful to use it to say that a method on a class always has the same result for the same input, without saying you can give the output back to the input again and also get the same result.
        Hide
        Hoss Man added a comment -

        Committed revision 1102922. - 3x backport

        Show
        Hoss Man added a comment - Committed revision 1102922. - 3x backport
        Hide
        Robert Muir added a comment -

        Bulk close for 3.2

        Show
        Robert Muir added a comment - Bulk close for 3.2
        Hide
        David Smiley added a comment -

        Does

        {!terms}

        render

        {!raw}

        obsolete? If not what practical uses do it have? If it is obsolete then it should be deprecated and removed.

        {!field}

        still appears useful to basically do a phrase query.

        Show
        David Smiley added a comment - Does {!terms} render {!raw} obsolete? If not what practical uses do it have? If it is obsolete then it should be deprecated and removed. {!field} still appears useful to basically do a phrase query.
        Hide
        Yonik Seeley added a comment -
        {!raw}

        is great for debugging since it can produce any term query regardless of field type (i.e. no validation, transformation, etc).

        Show
        Yonik Seeley added a comment - {!raw} is great for debugging since it can produce any term query regardless of field type (i.e. no validation, transformation, etc).

          People

          • Assignee:
            Hoss Man
            Reporter:
            Hoss Man
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development