Solr
  1. Solr
  2. SOLR-1387

Add more search options for filtering field facets.

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.1, Trunk
    • Component/s: search
    • Labels:
      None

      Description

      Currently for filtering the facets, we have to use prefix (which use String.startsWith() in java).
      We can add some parameters like

      • facet.iPrefix : this would act like case-insensitive search. (or ---> facet.prefix=a&facet.caseinsense=on)
      • facet.regex : this is pure regular expression search (which obviously would be expensive if issued).

      Moreover, allowing multiple filtering for same field would be great like
      facet.prefix=a OR facet.prefix=A ... sth like this.

      All above concepts could be equally applicable to TermsComponent.

      1. SOLR-1387-contains.patch
        14 kB
        Alan Woodward
      2. SOLR-1387-contains.patch
        10 kB
        Alan Woodward
      3. SOLR-1387.patch
        41 kB
        Tom Winch

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          2037d 19h 41m 1 Alan Woodward 26/Mar/15 13:46
          Resolved Resolved Reopened Reopened
          1d 2h 25m 1 Michael McCandless 27/Mar/15 16:12
          Reopened Reopened Resolved Resolved
          11d 23h 55m 1 Alan Woodward 08/Apr/15 17:07
          Resolved Resolved Closed Closed
          6d 8h 23m 1 Timothy Potter 15/Apr/15 01:30
          Timothy Potter made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Hide
          Timothy Potter added a comment -

          Bulk close after 5.1 release

          Show
          Timothy Potter added a comment - Bulk close after 5.1 release
          Alan Woodward made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          ASF subversion and git services added a comment -

          Commit 1672113 from Alan Woodward in branch 'dev/branches/lucene_solr_5_1'
          [ https://svn.apache.org/r1672113 ]

          SOLR-1387: Move contains() method to SimpleFacets

          Show
          ASF subversion and git services added a comment - Commit 1672113 from Alan Woodward in branch 'dev/branches/lucene_solr_5_1' [ https://svn.apache.org/r1672113 ] SOLR-1387 : Move contains() method to SimpleFacets
          Hide
          ASF subversion and git services added a comment -

          Commit 1672112 from Alan Woodward in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1672112 ]

          SOLR-1387: Move contains() method to SimpleFacets

          Show
          ASF subversion and git services added a comment - Commit 1672112 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1672112 ] SOLR-1387 : Move contains() method to SimpleFacets
          Hide
          ASF subversion and git services added a comment -

          Commit 1672106 from Alan Woodward in branch 'dev/trunk'
          [ https://svn.apache.org/r1672106 ]

          SOLR-1387: Move contains() method to SimpleFacets

          Show
          ASF subversion and git services added a comment - Commit 1672106 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1672106 ] SOLR-1387 : Move contains() method to SimpleFacets
          Alan Woodward made changes -
          Attachment SOLR-1387-contains.patch [ 12723917 ]
          Hide
          Alan Woodward added a comment -

          Oops, yes, that's exactly what I did. Here's the correct version...

          Show
          Alan Woodward added a comment - Oops, yes, that's exactly what I did. Here's the correct version...
          Hide
          Michael McCandless added a comment -

          Thank you Alan Woodward

          But the patch doesn't seem to actually remove the method from StringHelper?

          Or maybe you ran "svn diff" from inside solr subdir, so the patch is missing the lucene/ changes?

          Show
          Michael McCandless added a comment - Thank you Alan Woodward But the patch doesn't seem to actually remove the method from StringHelper? Or maybe you ran "svn diff" from inside solr subdir, so the patch is missing the lucene/ changes?
          Alan Woodward made changes -
          Attachment SOLR-1387-contains.patch [ 12723888 ]
          Hide
          Alan Woodward added a comment -

          Patch moving the 'contains' method to SimpleFacets, and refactoring it a bit to just use Strings. I'll commit this later today.

          Show
          Alan Woodward added a comment - Patch moving the 'contains' method to SimpleFacets, and refactoring it a bit to just use Strings. I'll commit this later today.
          Hide
          Alan Woodward added a comment -

          Can we move this method out for now, e.g. not put it in the shared StringHelper utility class?

          Sure, we could move it into the Solr faceting code. I'm away from a svn-accessible machine for 10 days or so now, I can do it when I get back or feel free to move it yourself.

          Show
          Alan Woodward added a comment - Can we move this method out for now, e.g. not put it in the shared StringHelper utility class? Sure, we could move it into the Solr faceting code. I'm away from a svn-accessible machine for 10 days or so now, I can do it when I get back or feel free to move it yourself.
          Michael McCandless made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Hide
          Michael McCandless added a comment -

          I'm concerned about the StringHelper.contains that was added for this issue:

          • Its signature implies it operates on BytesRef, but under the hood it secretly assumes the bytes are valid UTF-8 (only for the ignoreCase=true case)
          • It also secretly assumes Locale.ENGLISH for downcasing but the incoming UTF-8 bytes may not be English

          Can we move this method out for now, e.g. not put it in the shared StringHelper utility class?

          Show
          Michael McCandless added a comment - I'm concerned about the StringHelper.contains that was added for this issue: Its signature implies it operates on BytesRef, but under the hood it secretly assumes the bytes are valid UTF-8 (only for the ignoreCase=true case) It also secretly assumes Locale.ENGLISH for downcasing but the incoming UTF-8 bytes may not be English It has potentially poor performance compared to known algos e.g. http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm Can we move this method out for now, e.g. not put it in the shared StringHelper utility class?
          Hide
          Anil Khadka added a comment -

          Thanks for finally pushing it into the code.
          The 'contain' and 'contains.IgnoreCase' will cover most of the use-cases.
          I remember during that time, the code I wrote performed just fine (not terrible) for regular expression case. But mostly it was used for auto-completion that didn't use regex and worked pretty good.

          Directly using FSA (or FST), like in Lucene would be great for regex (and interesting project!)

          Thanks again guys.

          Show
          Anil Khadka added a comment - Thanks for finally pushing it into the code. The 'contain' and 'contains.IgnoreCase' will cover most of the use-cases. I remember during that time, the code I wrote performed just fine (not terrible) for regular expression case. But mostly it was used for auto-completion that didn't use regex and worked pretty good. Directly using FSA (or FST), like in Lucene would be great for regex (and interesting project!) Thanks again guys.
          Hide
          Tom Winch added a comment -

          Thanks Alan!

          Show
          Tom Winch added a comment - Thanks Alan!
          Alan Woodward made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Alan Woodward added a comment -

          Closing this for now - I have some ideas for extending faceting using Automata which would mean we could add support for filtering facets by arbitrary regexes, but that can go in a separate issue, I think.

          Show
          Alan Woodward added a comment - Closing this for now - I have some ideas for extending faceting using Automata which would mean we could add support for filtering facets by arbitrary regexes, but that can go in a separate issue, I think.
          Hide
          ASF subversion and git services added a comment -

          Commit 1669336 from Alan Woodward in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1669336 ]

          SOLR-1387: Add facet.contains and facet.contains.ignoreCase

          Show
          ASF subversion and git services added a comment - Commit 1669336 from Alan Woodward in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1669336 ] SOLR-1387 : Add facet.contains and facet.contains.ignoreCase
          Hide
          Tomjon added a comment -

          Excellent, thanks for your work Will

          Show
          Tomjon added a comment - Excellent, thanks for your work Will
          Alan Woodward made changes -
          Fix Version/s 5.1 [ 12329284 ]
          Fix Version/s 4.9 [ 12326731 ]
          Hide
          ASF subversion and git services added a comment -

          Commit 1669335 from Alan Woodward in branch 'dev/trunk'
          [ https://svn.apache.org/r1669335 ]

          SOLR-1387: Add facet.contains and facet.contains.ignoreCase

          Show
          ASF subversion and git services added a comment - Commit 1669335 from Alan Woodward in branch 'dev/trunk' [ https://svn.apache.org/r1669335 ] SOLR-1387 : Add facet.contains and facet.contains.ignoreCase
          Hide
          Alan Woodward added a comment -

          I'll try and get it in for 5.0 (which is approaching rapidly).

          Show
          Alan Woodward added a comment - I'll try and get it in for 5.0 (which is approaching rapidly).
          Hide
          Will Butler added a comment -

          Initial testing using this patch against the 5.x branch looks pretty promising. Using facet.contains and facet.contains.ignoreCase on a multi-value field with tens of millions of unique values in an index of roughly 100 million documents isn't super fast (~3s), but is usable. Our other attempted solution was to pull back all facet values for filtering in the client, but that caused the cluster to hang. Other than vote this issue up, is there anything else we can do to help move this issue along?

          Show
          Will Butler added a comment - Initial testing using this patch against the 5.x branch looks pretty promising. Using facet.contains and facet.contains.ignoreCase on a multi-value field with tens of millions of unique values in an index of roughly 100 million documents isn't super fast (~3s), but is usable. Our other attempted solution was to pull back all facet values for filtering in the client, but that caused the cluster to hang. Other than vote this issue up, is there anything else we can do to help move this issue along?
          Hide
          Tomjon added a comment -

          Indeed, using facet.contains without facet.prefix means examining every value for a facet, and using ignoreCase in addition makes it even worse.

          Show
          Tomjon added a comment - Indeed, using facet.contains without facet.prefix means examining every value for a facet, and using ignoreCase in addition makes it even worse.
          Hide
          Will Butler added a comment -

          facet.contains would be great to have. Any general comments on the worst case performance? Does it approach the cost of reading all possible facet values for a field?

          Show
          Will Butler added a comment - facet.contains would be great to have. Any general comments on the worst case performance? Does it approach the cost of reading all possible facet values for a field?
          Hide
          Tom Winch added a comment -

          As the name suggests, CharacterUtils works on a char[] whereas we have a BytesRef (essentially a byte[]). But I think CharacterUtils.toLowerCase() is doing essentially the same as I'm doing in StringHelper.contains() in that it converts using Unicode case mapping information (via Character.toLowerCase(int)).

          Yes, sadly making ignoreCase more general would spoil the efficiency of facet.prefix so I thought safest to leave as a sub-parameter of facet.contains, which spoils that efficiency already.

          Show
          Tom Winch added a comment - As the name suggests, CharacterUtils works on a char[] whereas we have a BytesRef (essentially a byte[]). But I think CharacterUtils.toLowerCase() is doing essentially the same as I'm doing in StringHelper.contains() in that it converts using Unicode case mapping information (via Character.toLowerCase(int)). Yes, sadly making ignoreCase more general would spoil the efficiency of facet.prefix so I thought safest to leave as a sub-parameter of facet.contains, which spoils that efficiency already.
          Hide
          Alan Woodward added a comment -

          This looks great.

          Rather than using BytesRef.utf8ToString() in StringUtils.contains() (which can be expensive), can we use CharacterUtils.toLowerCase() instead? Have a look at LowercaseFilterFactory to see how that works.

          It would be nice to make ignoreCase more general, rather than only applying to facet.contains, but I guess it won't really apply cleanly to things like facet.prefix.

          Show
          Alan Woodward added a comment - This looks great. Rather than using BytesRef.utf8ToString() in StringUtils.contains() (which can be expensive), can we use CharacterUtils.toLowerCase() instead? Have a look at LowercaseFilterFactory to see how that works. It would be nice to make ignoreCase more general, rather than only applying to facet.contains, but I guess it won't really apply cleanly to things like facet.prefix.
          Alan Woodward made changes -
          Assignee Alan Woodward [ romseygeek ]
          Tom Winch made changes -
          Attachment SOLR-1387.patch [ 12680789 ]
          Hide
          Tom Winch added a comment -

          I've been looking at this issue from the use-case of autocompletion, and in this case it's very desirable to include completions from the middle of a word. I've developed a patch which adds the following faceting parameters:

          facet.contains - similar to facet.prefix, but the string supplied may appear anywhere in the term
          facet.contains.ignoreCase - a Boolean value; if true, the comparison is case insensitive

          The implementation for facet.contains has been done for the enum, fc, fcs and grouped faceting methods. The memory usage and performance is likely to be as 'bad' as for the same query without the facet.contains restriction (you lose the advantage of sorted values that can be leveraged in facet.prefix).

          The ignore-case is implemented in terms of UTF-8 case insensitivity so is also potentially computationally expensive.

          Show
          Tom Winch added a comment - I've been looking at this issue from the use-case of autocompletion, and in this case it's very desirable to include completions from the middle of a word. I've developed a patch which adds the following faceting parameters: facet.contains - similar to facet.prefix, but the string supplied may appear anywhere in the term facet.contains.ignoreCase - a Boolean value; if true, the comparison is case insensitive The implementation for facet.contains has been done for the enum, fc, fcs and grouped faceting methods. The memory usage and performance is likely to be as 'bad' as for the same query without the facet.contains restriction (you lose the advantage of sorted values that can be leveraged in facet.prefix). The ignore-case is implemented in terms of UTF-8 case insensitivity so is also potentially computationally expensive.
          Uwe Schindler made changes -
          Fix Version/s 4.9 [ 12326731 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.8 [ 12326254 ]
          Hide
          Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          Nicholas Jakobsen added a comment -

          Linas Juškevičius, we had the same problems as you describe, can't show users downcased results, "prefix" must match any word, not just anchored at the beginning of the token. What we ended up doing is encoding more information in the tokenized values than just the result. We included the downcased search term and the term to display, but delimited them with a tab, e.g. "star wars Star Wars". Then in our app we grabbed the last half and showed it to the user. As for getting a matching prefix on different words, e.g. "wars", we created multiple tokens where we chomped a word off each time. e.g. "star wars Star Wars", "wars Star Wars". Each has the same "display portion", but we now have full control over the "matching portion".

          Show
          Nicholas Jakobsen added a comment - Linas Juškevičius , we had the same problems as you describe, can't show users downcased results, "prefix" must match any word, not just anchored at the beginning of the token. What we ended up doing is encoding more information in the tokenized values than just the result. We included the downcased search term and the term to display, but delimited them with a tab, e.g. "star wars Star Wars". Then in our app we grabbed the last half and showed it to the user. As for getting a matching prefix on different words, e.g. "wars", we created multiple tokens where we chomped a word off each time. e.g. "star wars Star Wars", "wars Star Wars". Each has the same "display portion", but we now have full control over the "matching portion".
          David Smiley made changes -
          Fix Version/s 4.8 [ 12326254 ]
          Fix Version/s 4.7 [ 12325573 ]
          Uwe Schindler made changes -
          Fix Version/s 4.7 [ 12325573 ]
          Fix Version/s 4.6 [ 12325000 ]
          Adrien Grand made changes -
          Fix Version/s 4.6 [ 12325000 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Hide
          Linas Juškevičius added a comment - - edited

          I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching.

          The reason is very well explained in SolrFacetingOverview - faceting is performed on the indexed values and they are returned. I can't show lowercased values to my users.

          A use case - we facet a multivalued field after an "fq" and get thousands of values. The user gets an infinite scrollable list through the values but we also want to let him search. Ideally a search for "states" should match "United States" which is not supported for two reasons:

          • term is not at the beginning of the indexed string,
          • term and indexed string cases do not match thus prefix filter does not help.

          A wildcard search (*states*) would help a lot. Regexp may be better but less performant. Any other ideas?

          Show
          Linas Juškevičius added a comment - - edited I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching. The reason is very well explained in SolrFacetingOverview - faceting is performed on the indexed values and they are returned. I can't show lowercased values to my users. A use case - we facet a multivalued field after an "fq" and get thousands of values. The user gets an infinite scrollable list through the values but we also want to let him search. Ideally a search for "states" should match "United States" which is not supported for two reasons: term is not at the beginning of the indexed string, term and indexed string cases do not match thus prefix filter does not help. A wildcard search (*states*) would help a lot. Regexp may be better but less performant. Any other ideas?
          Steve Rowe made changes -
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Fix Version/s 4.4 [ 12324324 ]
          Hide
          Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          Otis Gospodnetic added a comment -

          Joe Osowski - vote for it if you haven't.

          If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose)

          Is this really true? Won't the original string be preserved if stored="true"? Or is the indexed/lowercased value used?

          Show
          Otis Gospodnetic added a comment - Joe Osowski - vote for it if you haven't. If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose) Is this really true? Won't the original string be preserved if stored="true"? Or is the indexed/lowercased value used?
          Otis Gospodnetic made changes -
          Comment [ Vielen Dank für Ihre Nachricht!

          Zur Zeit bin ich nicht im Büro und kann auf Ihre E-Mail nicht reagieren. Ab dem 15.5.2013 bin ich wieder für Sie erreichbar.

          In dringenden Fällen wenden Sie sich bitte an support@digicol.de. Dieser Posteingang wird täglich geprüft.

          Vielen Dank für Ihr Verständnis.
          -----

          Thank you for your mail.

          I will be out of office from today and will be back on May 15, 2013.

          In my absence, please feel free to contact support@digicol.de. This inbox will be checked daily.

          Thank you for your understanding.

          Mit freundlichen Grüßen/With kind regards

          André Widhani
          Research & Development

          Digital Collections
          Verlagsgesellschaft mbH
          Wendenstrasse 130, 20537 Hamburg

          Tel: +49 40 23535-0
          Fax: +49 40 23535-180
          E-Mail: andre.widhani@digicol.de
          Internet: www.digicol.de
          HRB Hamburg 48373, Geschäftsführer: Jörn Olsen

          Haftungsbeschränkung:
          Diese Nachricht enthält vertrauliche Informationen und ist ausschließlich für den Adressaten bestimmt. Der Gebrauch durch Dritte ist verboten. Das Unternehmen ist nicht verantwortlich für die ordnungsgemäße, vollständige oder verzögerungsfreie Übertragung dieser Nachricht.
          ]
          Uwe Schindler made changes -
          Fix Version/s 4.4 [ 12324324 ]
          Fix Version/s 4.3 [ 12324128 ]
          Hide
          Joe Osowski added a comment -

          Is there any way to advocate for this feature more?

          Show
          Joe Osowski added a comment - Is there any way to advocate for this feature more?
          Hoss Man made changes -
          Link This issue is related to SOLR-4717 [ SOLR-4717 ]
          Robert Muir made changes -
          Fix Version/s 4.3 [ 12324128 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.2 [ 12323893 ]
          Mark Miller made changes -
          Fix Version/s 4.2 [ 12323893 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.1 [ 12321141 ]
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321141 ]
          Fix Version/s 4.0 [ 12314992 ]
          Hoss Man made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Hide
          Hoss Man added a comment -

          Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently.

          email notification suppressed to prevent mass-spam
          psuedo-unique token identifying these issues: hoss20120321nofix36

          Show
          Hoss Man added a comment - Bulk of fixVersion=3.6 -> fixVersion=4.0 for issues that have no assignee and have not been updated recently. email notification suppressed to prevent mass-spam psuedo-unique token identifying these issues: hoss20120321nofix36
          Hide
          Nicholas Jakobsen added a comment -

          I like the idea, as I haven't found any solutions to this problem that are compatible with Sunspot (ruby solr interface). Just looking at your code, you may want to move some of the loop invariant stuff out of the loops. e.g. the downcasing of prefixes is the same every iteration, but you downcase it each time through. Same goes for term.uppercase, you could move it out one loop as it doesn't change within the prefix loop.

          Show
          Nicholas Jakobsen added a comment - I like the idea, as I haven't found any solutions to this problem that are compatible with Sunspot (ruby solr interface). Just looking at your code, you may want to move some of the loop invariant stuff out of the loops. e.g. the downcasing of prefixes is the same every iteration, but you downcase it each time through. Same goes for term.uppercase, you could move it out one loop as it doesn't change within the prefix loop.
          Simon Willnauer made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Fix Version/s 3.5 [ 12317876 ]
          Robert Muir made changes -
          Fix Version/s 3.5 [ 12317876 ]
          Fix Version/s 3.4 [ 12316683 ]
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Robert Muir made changes -
          Fix Version/s 3.4 [ 12316683 ]
          Fix Version/s 4.0 [ 12314992 ]
          Fix Version/s 3.3 [ 12316471 ]
          Robert Muir made changes -
          Fix Version/s 3.3 [ 12316471 ]
          Fix Version/s 3.2 [ 12316172 ]
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hoss Man made changes -
          Fix Version/s 3.2 [ 12316172 ]
          Fix Version/s Next [ 12315093 ]
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Fix Version/s 1.5 [ 12313566 ]
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Anil Khadka made changes -
          Affects Version/s 1.4 [ 12313351 ]
          Fix Version/s 1.5 [ 12313566 ]
          Hide
          Hoss Man added a comment -

          linking issues so we ensure they are considered in conjunction

          Show
          Hoss Man added a comment - linking issues so we ensure they are considered in conjunction
          Hoss Man made changes -
          Link This issue relates to SOLR-1351 [ SOLR-1351 ]
          Hoss Man made changes -
          Field Original Value New Value
          Summary Add more search options for filtering facets. Add more search options for filtering field facets.
          Hide
          Anil Khadka added a comment -

          I've come up with following code. Any suggestions??
          [This is just a code snippet]

          Extension of SimpleFacet.java
          /*** SEARCHING ***/
          // HashSet is choosen to avoid duplicate entry
              HashSet<String> termsDump = new HashSet<String>();
                for (String term: terms ) { //<------ terms[] from FieldCache.DEFAULT ... StringIndex.loopup
                  if (term == null ) continue;
                  for (String p : iprefixList) { //<--- list of prefix to be search case insensitively.
                    // doing iprefix
                    if (term.toUpperCase().startsWith(p.toUpperCase())) { //<---- Is this the best way to do??
                      termsDump.add(term);
                    }
                  }
                  for (String re: regexList) { // <--- list of regular expression
                    if (term.matches(re)) {
                      //equivalent to Pattern.compile(re).matcher(term).matches()
                      termsDump.add(term);
                    }
                  }
                }
               // Just add the list of input terms without searching :)
               termsDump.addAll(inputTermsList);
                
          /*** COUNTING ***/ // <-- this counting method is different from regular prefix (finding spectrum in an array)
              FieldType ft = searcher.getSchema().getFieldType(field);
              NamedList<Integer> res = new NamedList();
              Term t = new Term(field);
              for (String term : termList) { // <---- termList = termsDump from above
                String internal = ft.toInternal(term);
                int count = searcher.numDocs(new TermQuery(t.createTerm(internal)), base); // <--- Do we loose performance on this??
                res.add(term, count);
              }
              
          /*** SORTING ***/ // <-- regular CountPair<String,Integer> thing.
              for (int i = 0, n= nList.size(); i <n; i++){
                    queue.add(new CountPair<String,Integer>(res.getName(i), res.getVal(i)));
                  }
          

          The syntax would look like (localParams style) this:

            &facet.field={!XFilter=on prefix=A,B,C iPrefix=a,b,c,d termsList=e,f,g,h regex=^a[a-z0-9]+g$,z*}field_name
          

          XFilter: i called this eXtended Filter for facet!!

          Show
          Anil Khadka added a comment - I've come up with following code. Any suggestions?? [This is just a code snippet] Extension of SimpleFacet.java /*** SEARCHING ***/ // HashSet is choosen to avoid duplicate entry HashSet< String > termsDump = new HashSet< String >(); for ( String term: terms ) { //<------ terms[] from FieldCache.DEFAULT ... StringIndex.loopup if (term == null ) continue ; for ( String p : iprefixList) { //<--- list of prefix to be search case insensitively. // doing iprefix if (term.toUpperCase().startsWith(p.toUpperCase())) { //<---- Is this the best way to do ?? termsDump.add(term); } } for ( String re: regexList) { // <--- list of regular expression if (term.matches(re)) { //equivalent to Pattern.compile(re).matcher(term).matches() termsDump.add(term); } } } // Just add the list of input terms without searching :) termsDump.addAll(inputTermsList); /*** COUNTING ***/ // <-- this counting method is different from regular prefix (finding spectrum in an array) FieldType ft = searcher.getSchema().getFieldType(field); NamedList< Integer > res = new NamedList(); Term t = new Term(field); for ( String term : termList) { // <---- termList = termsDump from above String internal = ft.toInternal(term); int count = searcher.numDocs( new TermQuery(t.createTerm(internal)), base); // <--- Do we loose performance on this ?? res.add(term, count); } /*** SORTING ***/ // <-- regular CountPair< String , Integer > thing. for ( int i = 0, n= nList.size(); i <n; i++){ queue.add( new CountPair< String , Integer >(res.getName(i), res.getVal(i))); } The syntax would look like (localParams style) this: &facet.field={!XFilter=on prefix=A,B,C iPrefix=a,b,c,d termsList=e,f,g,h regex=^a[a-z0-9]+g$,z*}field_name XFilter: i called this eXtended Filter for facet!!
          Hide
          Anil Khadka added a comment -

          > I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching.
          suppose i indexed a field called "placename" having name like California, Nevada, San Jose...
          If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose)
          And this will mess thing up (at least for me). I know there are others who want this too.

          > You mean wildcards. Right?
          Yes, it would be the first step towards it... [ again i don't mean A* or abc*.., i would rather want *a or a*bc]

          > This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix
          The syntax that was agreed upon was local params in this manner - facet.field=

          {!prefix=foo prefix=bar}

          myfield
          Yes this is what i'm talking about, having an option to get both the individual list and merge list for each query (here 'foo' and 'bar') would be better.

          Show
          Anil Khadka added a comment - > I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching. suppose i indexed a field called "placename" having name like California, Nevada, San Jose... If I use LowerCaseFilterFactory it will be stored in lowered case and when retrieving as FACET (or TermsComponent) it is also in lowered case. --> (california, nevada, san jose) And this will mess thing up (at least for me). I know there are others who want this too. > You mean wildcards. Right? Yes, it would be the first step towards it... [ again i don't mean A* or abc*.., i would rather want *a or a*bc] > This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix The syntax that was agreed upon was local params in this manner - facet.field= {!prefix=foo prefix=bar} myfield Yes this is what i'm talking about, having an option to get both the individual list and merge list for each query (here 'foo' and 'bar') would be better.
          Hide
          Avlesh Singh added a comment -

          facet.iPrefix : this would act like case-insensitive search. (or ---> facet.prefix=a&facet.caseinsense=on)

          I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching.

          facet.regex : this is pure regular expression search (which obviously would be expensive if issued).

          You mean wildcards. Right?

          Moreover, allowing multiple filtering for same field would be great like facet.prefix=a OR facet.prefix=A ... sth like this.

          This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix
          The syntax that was agreed upon was local params in this manner - facet.field=

          {!prefix=foo prefix=bar}

          myfield

          Show
          Avlesh Singh added a comment - facet.iPrefix : this would act like case-insensitive search. (or ---> facet.prefix=a&facet.caseinsense=on) I don't see a reason as to why the case filter be there. you can always apply a lower case filter to you field while indexing and searching. facet.regex : this is pure regular expression search (which obviously would be expensive if issued). You mean wildcards. Right? Moreover, allowing multiple filtering for same field would be great like facet.prefix=a OR facet.prefix=A ... sth like this. This has been recently discussed on the dev mailing list here - http://www.lucidimagination.com/search/document/f954dbb323746ed1/multiple_facet_prefix The syntax that was agreed upon was local params in this manner - facet.field= {!prefix=foo prefix=bar} myfield
          Anil Khadka created issue -

            People

            • Assignee:
              Alan Woodward
              Reporter:
              Anil Khadka
            • Votes:
              13 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development