Solr
  1. Solr
  2. SOLR-8466

Add support for UnInvertedField based faceting to FacetComponent

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0, 5.1, 5.2, 5.2.1, 5.3, 5.3.1, 5.4
    • Fix Version/s: 5.5, 6.0
    • Component/s: Facet Module, faceting
    • Labels:
      None

      Description

      The new JSON Faceting API supports building an UnInvertedField to do faceting which would be beneficial to add to Solr FacetComponent. This would provide additional options to implementors to choose the appropriate method of faceting for their particular use case.

      Summary

      added facet.method=uif which invokes UnInvertedField (top level data structure). facet.prefix and facet.contains are not supported. there are some jiras cover this gap.
      Note: field should be multivalued=true to get handled with uif, otherwise it silently goes to DV. Watch fieldValueCache stats while evaluating!

      1. 8466.diff
        51 kB
        Jamie Johnson
      2. SOLR-8466.patch
        36 kB
        Mikhail Khludnev
      3. SOLR-8466.patch
        35 kB
        Mikhail Khludnev
      4. SOLR-8466.patch
        36 kB
        Mikhail Khludnev
      5. SOLR-8466.patch
        36 kB
        Mikhail Khludnev
      6. SOLR-8466.patch
        26 kB
        Mikhail Khludnev
      7. SOLR-8466.patch
        8 kB
        Mikhail Khludnev
      8. SOLR-8466-failure.txt
        6 kB
        Mikhail Khludnev

        Issue Links

          Activity

          Hide
          Jamie Johnson added a comment -

          Initial implementation to add support for UnInvertedField based faceting to FacetComponent

          Show
          Jamie Johnson added a comment - Initial implementation to add support for UnInvertedField based faceting to FacetComponent
          Hide
          Bill Bell added a comment -

          This looks good, it appears to add facet.method=uif which it the way it was before.

          Are we ready to commit?

          Show
          Bill Bell added a comment - This looks good, it appears to add facet.method=uif which it the way it was before. Are we ready to commit?
          Hide
          Jamie Johnson added a comment -

          Correct, the patch adds facet.method=uif but I'd really like someone who is familiar with the JSON Faceting API to give this a go and make sure things look right.

          Show
          Jamie Johnson added a comment - Correct, the patch adds facet.method=uif but I'd really like someone who is familiar with the JSON Faceting API to give this a go and make sure things look right.
          Hide
          Mikhail Khludnev added a comment -

          Reviewing patch: SimpleFacets changes make sense, there is an test coverage (I suppose it's effective), but also there are many unnecessary changes like file moves. Jamie Johnson are you sure they can't be stripped off the patch? Also, pls name with jira ticket in according to tradition.

          Show
          Mikhail Khludnev added a comment - Reviewing patch: SimpleFacets changes make sense, there is an test coverage (I suppose it's effective), but also there are many unnecessary changes like file moves. Jamie Johnson are you sure they can't be stripped off the patch? Also, pls name with jira ticket in according to tradition.
          Hide
          Jamie Johnson added a comment - - edited

          Mikhail Khludnev, I'm assuming you're referring to splitting out the FacetContext, FacetParser, FacetTopParser, FacetQueryParser, FacetFieldParser and FacetRangeParser? If so FacetContext, FacetParser, FacetTopParser all are used by SimpleFacets. It felt a little strange to leave the others (FacetQueryParser, FacetFieldParser and FacetRangeParser) as protected within FacetRequest after moving the others out but I could move them back if that makes more sense.

          Show
          Jamie Johnson added a comment - - edited Mikhail Khludnev , I'm assuming you're referring to splitting out the FacetContext, FacetParser, FacetTopParser, FacetQueryParser, FacetFieldParser and FacetRangeParser? If so FacetContext, FacetParser, FacetTopParser all are used by SimpleFacets. It felt a little strange to leave the others (FacetQueryParser, FacetFieldParser and FacetRangeParser) as protected within FacetRequest after moving the others out but I could move them back if that makes more sense.
          Hide
          Jamie Johnson added a comment -

          updated patch name

          Show
          Jamie Johnson added a comment - updated patch name
          Hide
          Mikhail Khludnev added a comment -

          ok. Attaching SOLR-8466.patch. I understood an encapsulation challenge. Instead of ripping json facet framework I introduce factory method o.a.s.s.facet.FacetProcessor.createProcessor(SolrQueryRequest, Map<String, Object>, DocSet) that minimizes impact on codebase. Yonik Seeley what's your perception about it?
          I also removed exclude-aware docset calculation, because it's already handled at ..SimpleFacets.parseParams(String, String). So, far it lacks of test coverage for distributed faceting and fq exclusion.

          Opinions?

          Show
          Mikhail Khludnev added a comment - ok. Attaching SOLR-8466.patch . I understood an encapsulation challenge. Instead of ripping json facet framework I introduce factory method o.a.s.s.facet.FacetProcessor.createProcessor(SolrQueryRequest, Map<String, Object>, DocSet) that minimizes impact on codebase. Yonik Seeley what's your perception about it? I also removed exclude-aware docset calculation, because it's already handled at ..SimpleFacets.parseParams(String, String) . So, far it lacks of test coverage for distributed faceting and fq exclusion. Opinions?
          Hide
          Jamie Johnson added a comment -

          I like the approach as it's much less invasive. In regards to testing isn't this just a matter of piggy backing on the existing distributed faceting/exclusion tests to add uif as a method? I'm assuming those already exist?

          Show
          Jamie Johnson added a comment - I like the approach as it's much less invasive. In regards to testing isn't this just a matter of piggy backing on the existing distributed faceting/exclusion tests to add uif as a method? I'm assuming those already exist?
          Hide
          Mikhail Khludnev added a comment -

          Here is desired coverage, I'm still tweaking cases in SimpleFacetsTest.java. work in progress..

          Show
          Mikhail Khludnev added a comment - Here is desired coverage, I'm still tweaking cases in SimpleFacetsTest.java. work in progress..
          Hide
          Yonik Seeley added a comment -

          +1, this approach looks fine.

          Show
          Yonik Seeley added a comment - +1, this approach looks fine.
          Hide
          Mikhail Khludnev added a comment -

          while thoroughly testing the SOLR-8466.patch I've found something interesting.
          Jamie Johnson, can't you get the same performance boost if you just specify facet.version=2 in addition to old-style Solr facet parameters?
          Yonik Seeley wdyt?
          Note: current patch incorrectly handles facet.sort=true/false but this finding makes all these redundant.

          Show
          Mikhail Khludnev added a comment - while thoroughly testing the SOLR-8466.patch I've found something interesting . Jamie Johnson , can't you get the same performance boost if you just specify facet.version=2 in addition to old-style Solr facet parameters? Yonik Seeley wdyt? Note: current patch incorrectly handles facet.sort=true/false but this finding makes all these redundant.
          Hide
          Mikhail Khludnev added a comment -

          Hi,

          Ok. I looked on it once again. if facet.version=2 is specified, it adds one more response elements from JSON facets. Thus, workaround shortens:

          1. to get old good UnInvertedField perfomance you don't need to change parameters to JSON, but just specify facet.version=2
          2. and remove FacetComponent from handler's components, leaving FacetModule there.
          3. However, response format is different and an app should be changed to consume one that makes the issue still sensible
          facet_counts={facet_queries={},facet_fields={key1={foo=1},key2={bar=1}},facet_dates={},facet_ranges={},facet_intervals={},facet_heatmaps={}},
          

          vs

          facets={count=2,key1={buckets=[{val=foo,count=1}]},key2={buckets=[{val=bar,count=1}]}}
          
          Show
          Mikhail Khludnev added a comment - Hi, Ok. I looked on it once again. if facet.version=2 is specified, it adds one more response elements from JSON facets. Thus, workaround shortens: to get old good UnInvertedField perfomance you don't need to change parameters to JSON, but just specify facet.version=2 and remove FacetComponent from handler's components, leaving FacetModule there. However, response format is different and an app should be changed to consume one that makes the issue still sensible facet_counts={facet_queries={},facet_fields={key1={foo=1},key2={bar=1}},facet_dates={},facet_ranges={},facet_intervals={},facet_heatmaps={}}, vs facets={count=2,key1={buckets=[{val=foo,count=1}]},key2={buckets=[{val=bar,count=1}]}}
          Hide
          Mikhail Khludnev added a comment -

          work in progress.. fixed legacy sorting handling. Remaining issue is empty facets on certain trie field.

          Show
          Mikhail Khludnev added a comment - work in progress.. fixed legacy sorting handling. Remaining issue is empty facets on certain trie field.
          Hide
          Mikhail Khludnev added a comment -

          It seems like wasting a plenty of time on miserable thing..
          Here is the fix for testTrieFields failure. In fact it failed on empty result. I just enabling handling empty values..

                      if(mincount==0){
                        topLevel.put("processEmpty", true);
                      }
          ...
          ..
                        //if((Integer)som.get("count") > 0) {
                          SimpleOrderedMap<Object> asdf = (SimpleOrderedMap<Object>) som.get(field);
          
                          List<SimpleOrderedMap<Object>> buckets = (List<SimpleOrderedMap<Object>>)asdf.get("buckets");
                          for(SimpleOrderedMap<Object> b : buckets) {
                            counts.add(b.get("val").toString(), (Integer)b.get("count"));
                          }
                          if(missing) {
                            SimpleOrderedMap<Object> missingCounts = (SimpleOrderedMap<Object>) asdf.get("missing");
                            counts.add(null, (Integer)missingCounts.get("count"));
                          }
                       // } else if(missing) {
                       //   counts.add(null, 0);
                       // }
          
                      }
          
          Show
          Mikhail Khludnev added a comment - It seems like wasting a plenty of time on miserable thing.. Here is the fix for testTrieFields failure. In fact it failed on empty result. I just enabling handling empty values.. if (mincount==0){ topLevel.put( "processEmpty" , true ); } ... .. // if (( Integer )som.get( "count" ) > 0) { SimpleOrderedMap< Object > asdf = (SimpleOrderedMap< Object >) som.get(field); List<SimpleOrderedMap< Object >> buckets = (List<SimpleOrderedMap< Object >>)asdf.get( "buckets" ); for (SimpleOrderedMap< Object > b : buckets) { counts.add(b.get( "val" ).toString(), ( Integer )b.get( "count" )); } if (missing) { SimpleOrderedMap< Object > missingCounts = (SimpleOrderedMap< Object >) asdf.get( "missing" ); counts.add( null , ( Integer )missingCounts.get( "count" )); } // } else if (missing) { // counts.add( null , 0); // } }
          Hide
          Mikhail Khludnev added a comment -

          Attaching SOLR-8466.patch it never bypass empty processing.
          remaining failure is
          TestFaceting.testSimpleFacetCountsWithMultipleConfigurationsForSameField()
          the failures is really weird. for

          q=id:[42+TO+47]&facet.field={!key%3Dfoo+facet.prefix%3DToo+}trait_ss&facet.field={!key%3Dbar+facet.limit%3D2+facet.sort%3Dfalse+}trait_ss&facet.method=uif&fq=id:[42+TO+45]&facet=true&facet.zeros=false&wt=xml
          

          result is

          <result name="response" numFound="4" start="0"><doc><float name="id">42.0</float><arr name="trait_ss"><str>Tool</str><str>Obnoxious</str></arr><str name="name_s">Zapp Brannigan</str><long name="_version_">1525551790847688704</long></doc><doc><float name="id">43.0</float><str name="title_s">Democratic Order of Planets</str><long name="_version_">1525551790851883008</long></doc><doc><float name="id">44.0</float><arr name="trait_ss"><str>Tool</str></arr><str name="name_s">The Zapper</str><long name="_version_">1525551790878097408</long></doc><doc><float name="id">45.0</float><arr name="trait_ss"><str>Chauvinist</str></arr><str name="title_s">25 star General</str><long name="_version_">1525551790881243136</long></doc></result>
          

          we definitely have Tool twice but facet has only once! why??

          <lst name="facet_fields"><lst name="foo"><int name="Tool">1</int></lst><lst name="bar"><int name="Chauvinist">1</int><int name="Obnoxious">1</int></lst></lst>
          

          reasonably fails the assert

          //lst[@name='foo']/int[@name='Tool'][.='2']
          

          Summary

          It seems like a minor bug somewhere. If anyone from committer give his vote for it, I can extract these failures into ignored test, like it's done with SimpleFacetsTest.testFacetContainsUif() in the patch, otherwise it seems like no-commit for me.

          Show
          Mikhail Khludnev added a comment - Attaching SOLR-8466.patch it never bypass empty processing. remaining failure is TestFaceting.testSimpleFacetCountsWithMultipleConfigurationsForSameField() the failures is really weird. for q=id:[42+TO+47]&facet.field={!key%3Dfoo+facet.prefix%3DToo+}trait_ss&facet.field={!key%3Dbar+facet.limit%3D2+facet.sort%3Dfalse+}trait_ss&facet.method=uif&fq=id:[42+TO+45]&facet= true &facet.zeros= false &wt=xml result is <result name= "response" numFound= "4" start= "0" ><doc>< float name= "id" >42.0</ float ><arr name= "trait_ss" ><str>Tool</str><str>Obnoxious</str></arr><str name= "name_s" >Zapp Brannigan</str>< long name= "_version_" >1525551790847688704</ long ></doc><doc>< float name= "id" >43.0</ float ><str name= "title_s" >Democratic Order of Planets</str>< long name= "_version_" >1525551790851883008</ long ></doc><doc>< float name= "id" >44.0</ float ><arr name= "trait_ss" ><str>Tool</str></arr><str name= "name_s" >The Zapper</str>< long name= "_version_" >1525551790878097408</ long ></doc><doc>< float name= "id" >45.0</ float ><arr name= "trait_ss" ><str>Chauvinist</str></arr><str name= "title_s" >25 star General</str>< long name= "_version_" >1525551790881243136</ long ></doc></result> we definitely have Tool twice but facet has only once! why?? <lst name= "facet_fields" ><lst name= "foo" >< int name= "Tool" >1</ int ></lst><lst name= "bar" >< int name= "Chauvinist" >1</ int >< int name= "Obnoxious" >1</ int ></lst></lst> reasonably fails the assert //lst[@name='foo']/ int [@name='Tool'][.='2'] Summary It seems like a minor bug somewhere. If anyone from committer give his vote for it, I can extract these failures into ignored test, like it's done with SimpleFacetsTest.testFacetContainsUif() in the patch, otherwise it seems like no-commit for me.
          Hide
          Mikhail Khludnev added a comment -

          aha... here are TODO! the problem is omitting a multivalued occurrence when prefix is provided. it's worse than with contains which is just not supported.

          I'm going to throw error when facet.prefix is requested along side with facet.mehod=uif

          Show
          Mikhail Khludnev added a comment - aha... here are TODO ! the problem is omitting a multivalued occurrence when prefix is provided. it's worse than with contains which is just not supported. I'm going to throw error when facet.prefix is requested along side with facet.mehod=uif
          Hide
          David Smiley added a comment -

          The loss of support for facet.prefix seems to be in SOLR-7214 (JSON Facet API), May 16th 2015.

          Show
          David Smiley added a comment - The loss of support for facet.prefix seems to be in SOLR-7214 (JSON Facet API), May 16th 2015.
          Hide
          Yonik Seeley added a comment -

          Ummm, no... all use of UnInvertedField had been removed by LUCENE-5666.

          UnInvertedField in the facet package (i.e. the one that is part of the JSON Facet API) shares a common ancestry with the previous one, but it's different.
          When I committed the JSON Facet API, I removed the other unused UnInvertedField (which was dead and non-working code at that point), and would have just caused more confusion. There was already an issue open to remove that dead code.

          Show
          Yonik Seeley added a comment - Ummm, no... all use of UnInvertedField had been removed by LUCENE-5666 . UnInvertedField in the facet package (i.e. the one that is part of the JSON Facet API) shares a common ancestry with the previous one, but it's different. When I committed the JSON Facet API, I removed the other unused UnInvertedField (which was dead and non-working code at that point), and would have just caused more confusion. There was already an issue open to remove that dead code.
          Hide
          David Smiley added a comment -

          Thanks for the clarification Yonik. I see it's different; but nonetheless shares a ton in common. It even seems to have the same class javadocs and same superclass.

          Show
          David Smiley added a comment - Thanks for the clarification Yonik. I see it's different; but nonetheless shares a ton in common. It even seems to have the same class javadocs and same superclass.
          Hide
          Yonik Seeley added a comment -

          Like I say... common ancestry - it forked off from the original one.

          I don't remember details around the facet.prefix comment though... and facet prefixes do seem to work in the JSON Facet API (there are some prefix tests in TestJsonFacets)

          Show
          Yonik Seeley added a comment - Like I say... common ancestry - it forked off from the original one. I don't remember details around the facet.prefix comment though... and facet prefixes do seem to work in the JSON Facet API (there are some prefix tests in TestJsonFacets)
          Hide
          Mikhail Khludnev added a comment -

          ok. Here is SOLR-8466.patch it prohibits prefix for uif, and tests ugly bypass this case. I'm going to commit it tomorrow, unless you stop me.

          Show
          Mikhail Khludnev added a comment - ok. Here is SOLR-8466.patch it prohibits prefix for uif, and tests ugly bypass this case. I'm going to commit it tomorrow, unless you stop me.
          Hide
          ASF subversion and git services added a comment -

          Commit eac3bb9b32a45e5fc9faa54b372f89e25606a976 in lucene-solr's branch refs/heads/master from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eac3bb9 ]

          SOLR-8466: facet.method=uif for UnInvertedField faceting, like it was with 'fc' earlier.

          Show
          ASF subversion and git services added a comment - Commit eac3bb9b32a45e5fc9faa54b372f89e25606a976 in lucene-solr's branch refs/heads/master from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eac3bb9 ] SOLR-8466 : facet.method=uif for UnInvertedField faceting, like it was with 'fc' earlier.
          Hide
          ASF subversion and git services added a comment -

          Commit 93750292c258ca9361a9e3271fb2087be40557ec in lucene-solr's branch refs/heads/master from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9375029 ]

          SOLR-8466: fixing CHANGES.txt moving to 5.5.0 Features.

          Show
          ASF subversion and git services added a comment - Commit 93750292c258ca9361a9e3271fb2087be40557ec in lucene-solr's branch refs/heads/master from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9375029 ] SOLR-8466 : fixing CHANGES.txt moving to 5.5.0 Features.
          Hide
          ASF subversion and git services added a comment -

          Commit c3cdb84dc60afb265441ca67714685e6bd16ca22 in lucene-solr's branch refs/heads/branch_5x from Mikhail Khludnev
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c3cdb84 ]

          SOLR-8466: facet.method=uif for UnInvertedField faceting, like it was with 'fc' earlier.

          Show
          ASF subversion and git services added a comment - Commit c3cdb84dc60afb265441ca67714685e6bd16ca22 in lucene-solr's branch refs/heads/branch_5x from Mikhail Khludnev [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c3cdb84 ] SOLR-8466 : facet.method=uif for UnInvertedField faceting, like it was with 'fc' earlier.
          Hide
          Mikhail Khludnev added a comment -

          committed in both. will check CI in a few hours.

          Show
          Mikhail Khludnev added a comment - committed in both. will check CI in a few hours.
          Hide
          Mikhail Khludnev added a comment -

          Thanks, Jamie Johnson!

          Colleagues, feedback is quite appreciated! Don't hesitate to check a nightly build!

          Show
          Mikhail Khludnev added a comment - Thanks, Jamie Johnson ! Colleagues, feedback is quite appreciated! Don't hesitate to check a nightly build!
          Hide
          Mikhail Khludnev added a comment -

          perhaps it's SOLR-8155 ?

          Show
          Mikhail Khludnev added a comment - perhaps it's SOLR-8155 ?

            People

            • Assignee:
              Mikhail Khludnev
              Reporter:
              Jamie Johnson
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development