Solr
  1. Solr
  2. SOLR-777

backword match search, for domain search etc.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.4
    • Component/s: search
    • Labels:
      None

      Description

      There is a requirement for searching domains with backward match. For example, using "apache.org" for a query string, www.apache.org, lucene.apache.org could be returned.

        Activity

        Hide
        Walter Underwood added a comment -

        You don't need backwards matching for this, and it doesn't really do the right thing.

        Split the string on ".", reverse the list, and join successive sublists with ".". Don't index the length one list, since that is ".com", ".net", etc. Do the same processing at query time.

        This is a special analyzer.

        Show
        Walter Underwood added a comment - You don't need backwards matching for this, and it doesn't really do the right thing. Split the string on ".", reverse the list, and join successive sublists with ".". Don't index the length one list, since that is ".com", ".net", etc. Do the same processing at query time. This is a special analyzer.
        Hide
        Koji Sekiguchi added a comment -

        ReverseStringFilter and its factory class to reverse token string. To use this, define schema.xml:

        <fieldType name="reverseString" class="solr.TextField">
          <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.ReverseStringFilterFactory"/>
          </analyzer>
        </fieldType>
             :
        <field name="domain" type="reverseString" indexed="true" stored="true" /> 
        

        and use analysis.jsp to see what happen.

        TODO:

        • consider the posibilities of getting PrefixQuery. for instance, q=domain:apache.org => PrefixQuery( Term( "domain", "gro.ehcapa" ) )
        • JUnit test code for this TokenFilter
        Show
        Koji Sekiguchi added a comment - ReverseStringFilter and its factory class to reverse token string. To use this, define schema.xml: <fieldType name= "reverseString" class= "solr.TextField" > <analyzer> <tokenizer class= "solr.KeywordTokenizerFactory" /> <filter class= "solr.ReverseStringFilterFactory" /> </analyzer> </fieldType> : <field name= "domain" type= "reverseString" indexed= "true" stored= "true" /> and use analysis.jsp to see what happen. TODO: consider the posibilities of getting PrefixQuery. for instance, q=domain:apache.org => PrefixQuery( Term( "domain", "gro.ehcapa" ) ) JUnit test code for this TokenFilter
        Hide
        Koji Sekiguchi added a comment -

        Walter,

        The domain example is a just example for explanation. I think there are requirements for backward match in general. I'm thinking something like ReverseStrField which reverses token string when indexing and query. And if possible, q=domain:apache.org makes PrefixQuery( Term( "domain", "gro.ehcapa" ) ). What do you think?

        Show
        Koji Sekiguchi added a comment - Walter, The domain example is a just example for explanation. I think there are requirements for backward match in general. I'm thinking something like ReverseStrField which reverses token string when indexing and query. And if possible, q=domain:apache.org makes PrefixQuery( Term( "domain", "gro.ehcapa" ) ). What do you think?
        Hide
        Otis Gospodnetic added a comment -

        Shouldn't ReverseStringFilter really go to Lucene?

        Show
        Otis Gospodnetic added a comment - Shouldn't ReverseStringFilter really go to Lucene?
        Hide
        Mike Klaas added a comment -

        As Walter mentioned, you don't really want reverse string for matching domains. The best way to store domains is reverse component (www.google.com -> com.google.www); it is one of the admitted failures of the designers of DNS to not do it that way.

        Storing that in a string field, you can search for revdomain:(com.google com.google.*) to match a domain+subdomains correctly (Note: your prefix query isn't correct, as it would match www.notreallyapache.org).

        Show
        Mike Klaas added a comment - As Walter mentioned, you don't really want reverse string for matching domains. The best way to store domains is reverse component (www.google.com -> com.google.www); it is one of the admitted failures of the designers of DNS to not do it that way. Storing that in a string field, you can search for revdomain:(com.google com.google.*) to match a domain+subdomains correctly (Note: your prefix query isn't correct, as it would match www.notreallyapache.org).
        Hide
        Koji Sekiguchi added a comment -

        Shouldn't ReverseStringFilter really go to Lucene?

        I'd like to do so. Is it ok to go it to core analysis package or should it be part of contrib package?

        Show
        Koji Sekiguchi added a comment - Shouldn't ReverseStringFilter really go to Lucene? I'd like to do so. Is it ok to go it to core analysis package or should it be part of contrib package?
        Hide
        Koji Sekiguchi added a comment -

        The domain example is a just example. I think there are requirements for searching "*day", "*teen", ... in general. But I'll consider "reverse component" way when searching domains, etc.

        (Note: your prefix query isn't correct, as it would match www.notreallyapache.org)

        Right.

        I'll post ReverseStringFilter to Lucene. After it committed, I'll commit the factory in Solr.

        Show
        Koji Sekiguchi added a comment - The domain example is a just example. I think there are requirements for searching "*day", "*teen", ... in general. But I'll consider "reverse component" way when searching domains, etc. (Note: your prefix query isn't correct, as it would match www.notreallyapache.org) Right. I'll post ReverseStringFilter to Lucene. After it committed, I'll commit the factory in Solr.
        Hide
        Otis Gospodnetic added a comment -

        Koji, I'd stick it in contrib.

        Show
        Otis Gospodnetic added a comment - Koji, I'd stick it in contrib.
        Hide
        Koji Sekiguchi added a comment -

        Koji, I'd stick it in contrib.

        Oops. I didn't notice your reply and opened LUCENE-1398 that adds it in core.

        Show
        Koji Sekiguchi added a comment - Koji, I'd stick it in contrib. Oops. I didn't notice your reply and opened LUCENE-1398 that adds it in core.
        Hide
        Koji Sekiguchi added a comment -

        Committed revision 764291.

        Show
        Koji Sekiguchi added a comment - Committed revision 764291.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for Solr 1.4

        Show
        Grant Ingersoll added a comment - Bulk close for Solr 1.4

          People

          • Assignee:
            Unassigned
            Reporter:
            Koji Sekiguchi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development