Solr
  1. Solr
  2. SOLR-1387

Add more search options for filtering field facets.

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: search
    • Labels:
      None

      Description

      Currently for filtering the facets, we have to use prefix (which use String.startsWith() in java).
      We can add some parameters like

      • facet.iPrefix : this would act like case-insensitive search. (or ---> facet.prefix=a&facet.caseinsense=on)
      • facet.regex : this is pure regular expression search (which obviously would be expensive if issued).

      Moreover, allowing multiple filtering for same field would be great like
      facet.prefix=a OR facet.prefix=A ... sth like this.

      All above concepts could be equally applicable to TermsComponent.

        Issue Links

          Activity

          Hide
          Avlesh Singh added a comment -

          facet.iPrefix : this would act like case-insensitive search. (or ---> facet.prefix=a&facet.caseinsense=on)

          I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching.

          facet.regex : this is pure regular expression search (which obviously would be expensive if issued).

          You mean wildcards. Right?

          Moreover, allowing multiple filtering for same field would be great like facet.prefix=a OR facet.prefix=A ... sth like this.

          This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix
          The syntax that was agreed upon was local params in this manner - facet.field=

          {!prefix=foo prefix=bar}

          myfield

          Show
          Avlesh Singh added a comment - facet.iPrefix : this would act like case-insensitive search. (or ---> facet.prefix=a&facet.caseinsense=on) I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching. facet.regex : this is pure regular expression search (which obviously would be expensive if issued). You mean wildcards. Right? Moreover, allowing multiple filtering for same field would be great like facet.prefix=a OR facet.prefix=A ... sth like this. This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix The syntax that was agreed upon was local params in this manner - facet.field= {!prefix=foo prefix=bar} myfield
          Hide
          Anil Khadka added a comment -

          > I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching.
          suppose i indexed a field called "placename" having name like California, Nevada, San Jose...
          If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose)
          And this will mess thing up (at least for me). I know there are others who want this too.

          > You mean wildcards. Right?
          Yes, it would be the first step towards it... [ again i don't mean A* or abc*.., i would rather want *a or a*bc]

          > This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix
          The syntax that was agreed upon was local params in this manner - facet.field=

          {!prefix=foo prefix=bar}

          myfield
          Yes this is what i'm talking about, having an option to get both the individual list and merge list for each query (here 'foo' and 'bar') would be better.

          Show
          Anil Khadka added a comment - > I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching. suppose i indexed a field called "placename" having name like California, Nevada, San Jose... If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose) And this will mess thing up (at least for me). I know there are others who want this too. > You mean wildcards. Right? Yes, it would be the first step towards it... [ again i don't mean A* or abc*.., i would rather want *a or a*bc] > This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix The syntax that was agreed upon was local params in this manner - facet.field= {!prefix=foo prefix=bar} myfield Yes this is what i'm talking about, having an option to get both the individual list and merge list for each query (here 'foo' and 'bar') would be better.
          Hide
          Anil Khadka added a comment -

          I've come up with following code. Any suggestions??
          [This is just a code snippet]

          Extension of SimpleFacet.java
          /*** SEARCHING ***/
          // HashSet is choosen to avoid duplicate entry
              HashSet<String> termsDump = new HashSet<String>();
                for (String term: terms ) { //<------ terms[] from FieldCache.DEFAULT ... StringIndex.loopup
                  if (term == null ) continue;
                  for (String p : iprefixList) { //<--- list of prefix to be search case insensitively.
                    // doing iprefix
                    if (term.toUpperCase().startsWith(p.toUpperCase())) { //<---- Is this the best way to do??
                      termsDump.add(term);
                    }
                  }
                  for (String re: regexList) { // <--- list of regular expression
                    if (term.matches(re)) {
                      //equivalent to Pattern.compile(re).matcher(term).matches()
                      termsDump.add(term);
                    }
                  }
                }
               // Just add the list of input terms without searching :)
               termsDump.addAll(inputTermsList);
                
          /*** COUNTING ***/ // <-- this counting method is different from regular prefix (finding spectrum in an array)
              FieldType ft = searcher.getSchema().getFieldType(field);
              NamedList<Integer> res = new NamedList();
              Term t = new Term(field);
              for (String term : termList) { // <---- termList = termsDump from above
                String internal = ft.toInternal(term);
                int count = searcher.numDocs(new TermQuery(t.createTerm(internal)), base); // <--- Do we loose performance on this??
                res.add(term, count);
              }
              
          /*** SORTING ***/ // <-- regular CountPair<String,Integer> thing.
              for (int i = 0, n= nList.size(); i <n; i++){
                    queue.add(new CountPair<String,Integer>(res.getName(i), res.getVal(i)));
                  }
          

          The syntax would look like (localParams style) this:

            &facet.field={!XFilter=on prefix=A,B,C iPrefix=a,b,c,d termsList=e,f,g,h regex=^a[a-z0-9]+g$,z*}field_name
          

          XFilter: i called this eXtended Filter for facet!!

          Show
          Anil Khadka added a comment - I've come up with following code. Any suggestions?? [This is just a code snippet] Extension of SimpleFacet.java /*** SEARCHING ***/ // HashSet is choosen to avoid duplicate entry HashSet< String > termsDump = new HashSet< String >(); for ( String term: terms ) { //<------ terms[] from FieldCache.DEFAULT ... StringIndex.loopup if (term == null ) continue ; for ( String p : iprefixList) { //<--- list of prefix to be search case insensitively. // doing iprefix if (term.toUpperCase().startsWith(p.toUpperCase())) { //<---- Is this the best way to do ?? termsDump.add(term); } } for ( String re: regexList) { // <--- list of regular expression if (term.matches(re)) { //equivalent to Pattern.compile(re).matcher(term).matches() termsDump.add(term); } } } // Just add the list of input terms without searching :) termsDump.addAll(inputTermsList); /*** COUNTING ***/ // <-- this counting method is different from regular prefix (finding spectrum in an array) FieldType ft = searcher.getSchema().getFieldType(field); NamedList< Integer > res = new NamedList(); Term t = new Term(field); for ( String term : termList) { // <---- termList = termsDump from above String internal = ft.toInternal(term); int count = searcher.numDocs( new TermQuery(t.createTerm(internal)), base); // <--- Do we loose performance on this ?? res.add(term, count); } /*** SORTING ***/ // <-- regular CountPair< String , Integer > thing. for ( int i = 0, n= nList.size(); i <n; i++){ queue.add( new CountPair< String , Integer >(res.getName(i), res.getVal(i))); } The syntax would look like (localParams style) this: &facet.field={!XFilter=on prefix=A,B,C iPrefix=a,b,c,d termsList=e,f,g,h regex=^a[a-z0-9]+g$,z*}field_name XFilter: i called this eXtended Filter for facet!!
          Hide
          Hoss Man added a comment -

          linking issues so we ensure they are considered in conjunction

          Show
          Hoss Man added a comment - linking issues so we ensure they are considered in conjunction
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Nicholas Jakobsen added a comment -

          I like the idea, as I haven't found any solutions to this problem that are compatible with Sunspot (ruby solr interface). Just looking at your code, you may want to move some of the loop invariant stuff out of the loops. e.g. the downcasing of prefixes is the same every iteration, but you downcase it each time through. Same goes for term.uppercase, you could move it out one loop as it doesn't change within the prefix loop.

          Show
          Nicholas Jakobsen added a comment - I like the idea, as I haven't found any solutions to this problem that are compatible with Sunspot (ruby solr interface). Just looking at your code, you may want to move some of the loop invariant stuff out of the loops. e.g. the downcasing of prefixes is the same every iteration, but you downcase it each time through. Same goes for term.uppercase, you could move it out one loop as it doesn't change within the prefix loop.
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Joe Osowski added a comment -

          Is there any way to advocate for this feature more?

          Show
          Joe Osowski added a comment - Is there any way to advocate for this feature more?
          Hide
          Otis Gospodnetic added a comment -

          Joe Osowski - vote for it if you haven't.

          If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose)

          Is this really true? Won't the original string be preserved if stored="true"? Or is the indexed/lowercased value used?

          Show
          Otis Gospodnetic added a comment - Joe Osowski - vote for it if you haven't. If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose) Is this really true? Won't the original string be preserved if stored="true"? Or is the indexed/lowercased value used?
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Linas Juškevičius added a comment - - edited

          I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching.

          The reason is very well explained in SolrFacetingOverview - faceting is performed on the indexed values and they are returned. I can't show lowercased values to my users.

          A use case - we facet a multivalued field after an "fq" and get thousands of values. The user gets an infinite scrollable list through the values but we also want to let him search. Ideally a search for "states" should match "United States" which is not supported for two reasons:

          • term is not at the beginning of the indexed string,
          • term and indexed string cases do not match thus prefix filter does not help.

          A wildcard search (*states*) would help a lot. Regexp may be better but less performant. Any other ideas?

          Show
          Linas Juškevičius added a comment - - edited I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching. The reason is very well explained in SolrFacetingOverview - faceting is performed on the indexed values and they are returned. I can't show lowercased values to my users. A use case - we facet a multivalued field after an "fq" and get thousands of values. The user gets an infinite scrollable list through the values but we also want to let him search. Ideally a search for "states" should match "United States" which is not supported for two reasons: term is not at the beginning of the indexed string, term and indexed string cases do not match thus prefix filter does not help. A wildcard search (*states*) would help a lot. Regexp may be better but less performant. Any other ideas?
          Hide
          Nicholas Jakobsen added a comment -

          Linas Juškevičius, we had the same problems as you describe, can't show users downcased results, "prefix" must match any word, not just anchored at the beginning of the token. What we ended up doing is encoding more information in the tokenized values than just the result. We included the downcased search term and the term to display, but delimited them with a tab, e.g. "star wars Star Wars". Then in our app we grabbed the last half and showed it to the user. As for getting a matching prefix on different words, e.g. "wars", we created multiple tokens where we chomped a word off each time. e.g. "star wars Star Wars", "wars Star Wars". Each has the same "display portion", but we now have full control over the "matching portion".

          Show
          Nicholas Jakobsen added a comment - Linas Juškevičius , we had the same problems as you describe, can't show users downcased results, "prefix" must match any word, not just anchored at the beginning of the token. What we ended up doing is encoding more information in the tokenized values than just the result. We included the downcased search term and the term to display, but delimited them with a tab, e.g. "star wars Star Wars". Then in our app we grabbed the last half and showed it to the user. As for getting a matching prefix on different words, e.g. "wars", we created multiple tokens where we chomped a word off each time. e.g. "star wars Star Wars", "wars Star Wars". Each has the same "display portion", but we now have full control over the "matching portion".
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.

            People

            • Assignee:
              Unassigned
              Reporter:
              Anil Khadka
            • Votes:
              15 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:

                Development