Lucene - Core
  1. Lucene - Core
  2. LUCENE-4461

Multiple FacetRequest with the same path creates inconsistent results

    Details

    • Lucene Fields:
      New, Patch Available

      Description

      Multiple FacetRequest are getting merged into one creating wrong results in this case:

      FacetSearchParams facetSearchParams = new FacetSearchParams();
      facetSearchParams.addFacetRequest(new CountFacetRequest(new CategoryPath("author"), 10));
      facetSearchParams.addFacetRequest(new CountFacetRequest(new CategoryPath("author"), 10));

      Problem can be fixed by defining hashcode and equals in certain way that Lucene recognize we are talking about different requests.

      Attached test case.

      1. LUCENE-4461.patch
        5 kB
        Gilad Barkai
      2. LuceneFacetTest.java
        3 kB
        Rodrigo Vega

        Activity

        Hide
        Gilad Barkai added a comment -

        Nice catch!

        Took a while to pinpoint the reason - lines 173-181 of StandardFacetsAccumulator.
        In the mentioned lines, a 'merge' is performed over categories which matched the request, but reside on different partitions.

        Partitions are an optimization which limit the RAM requirements per query to a constant, rather than linear to the taxonomy size (could be millions of categories). The taxonomy is virtually "splitted" into partitions of constant size, a top-k is heaped from each partition, and all those top-k results are being merged to a global top-k list

        The proposed solution of changing the hashCode and equals so that the same request will have two hashCodes and will not be equal to itself is very likely to break other parts of the code.

        Perhaps such cases could be prevented all together? e.g throwing an exception when the (exact) same request is added twice.
        Is that a reasonable solution? Are there cases where it is necessary to request the same path twice?
        Please note that a different count, depth, path etc - makes a different request, so requesting "author" with count 10 and count 11 makes different requests - which are handled simultaneously correctly in current versions.

        Show
        Gilad Barkai added a comment - Nice catch! Took a while to pinpoint the reason - lines 173-181 of StandardFacetsAccumulator. In the mentioned lines, a 'merge' is performed over categories which matched the request, but reside on different partitions. Partitions are an optimization which limit the RAM requirements per query to a constant, rather than linear to the taxonomy size (could be millions of categories). The taxonomy is virtually "splitted" into partitions of constant size, a top-k is heaped from each partition, and all those top-k results are being merged to a global top-k list The proposed solution of changing the hashCode and equals so that the same request will have two hashCodes and will not be equal to itself is very likely to break other parts of the code. Perhaps such cases could be prevented all together? e.g throwing an exception when the (exact) same request is added twice. Is that a reasonable solution? Are there cases where it is necessary to request the same path twice? Please note that a different count, depth, path etc - makes a different request, so requesting "author" with count 10 and count 11 makes different requests - which are handled simultaneously correctly in current versions.
        Hide
        Rodrigo Vega added a comment -

        In my case the final user can specify his/her default query to show once he/she is logged into the system. This issue came up when somebody define the same facet request using one as filter and the other only as a pure count request.

        I did not find a clean way to fix it looking through the code, but i'm trying the current "ugly" solution where "counter" is the request index.

        public class CustomCountFacetRequest extends CountFacetRequest {
        
        	private int hashCode;
        
        	public CustomCountFacetRequest(CategoryPath path, int num) {
        		this(path, num, 0);
        	}
        
        	public CustomCountFacetRequest(CategoryPath path, int num, int counter) {
        		super(path, num);
        		hashCode = super.hashCode() * counter;
        	}
        
        	@Override
        	public int hashCode() {
        		return hashCode;
        	}
        
        	@Override
        	public boolean equals(Object o) {
        		if (o instanceof CustomCountFacetRequest) {
        			CustomCountFacetRequest that = (CustomCountFacetRequest) o;
        			return that.hashCode == this.hashCode && super.equals(o);
        		}
        		return false;
        	}
        }
        

        I didn't find collateral effects on this solution yet, however i'm worried with your comments about breaking other parts of the code.

        I'm not sure if throwing an exception is the best solution, but at least the response will be consistent.

        Show
        Rodrigo Vega added a comment - In my case the final user can specify his/her default query to show once he/she is logged into the system. This issue came up when somebody define the same facet request using one as filter and the other only as a pure count request. I did not find a clean way to fix it looking through the code, but i'm trying the current "ugly" solution where "counter" is the request index. public class CustomCountFacetRequest extends CountFacetRequest { private int hashCode; public CustomCountFacetRequest(CategoryPath path, int num) { this (path, num, 0); } public CustomCountFacetRequest(CategoryPath path, int num, int counter) { super (path, num); hashCode = super .hashCode() * counter; } @Override public int hashCode() { return hashCode; } @Override public boolean equals( Object o) { if (o instanceof CustomCountFacetRequest) { CustomCountFacetRequest that = (CustomCountFacetRequest) o; return that.hashCode == this .hashCode && super .equals(o); } return false ; } } I didn't find collateral effects on this solution yet, however i'm worried with your comments about breaking other parts of the code. I'm not sure if throwing an exception is the best solution, but at least the response will be consistent.
        Hide
        Gilad Barkai added a comment - - edited

        The same category can be set as a filter and as a request without them colliding - a filter is not correlated or dependent on a facet request.
        facets filters are done at the query level which affects the result set, while the facetRequest defines which categories to retrieve out of the result set.
        I probably miss something here

        Show
        Gilad Barkai added a comment - - edited The same category can be set as a filter and as a request without them colliding - a filter is not correlated or dependent on a facet request. facets filters are done at the query level which affects the result set, while the facetRequest defines which categories to retrieve out of the result set. I probably miss something here
        Hide
        Rodrigo Vega added a comment -

        Yes, I know that but i need a generic way to handle the facet requests during search preparation stage. For that reason I have a FacetParams object with the following properties: int limit, String[] path, boolean filter

        Then I can have some peace of code like this:

        // Query is a bean that wraps a lucene query just to make easy the integration with our UI. REal lucene query is // handle by "q"
        if (query.isFaceted())
        	for (FacetParams facet : query.getFacet()) {
        
        		CategoryPath c = new CategoryPath(facet.getPath());
        		facetSearchParams.addFacetRequest(new CountFacetRequest(c, facet.getLimit()));
        
        		if (facet.isFilter())
        			q = DrillDown.query(q, c);
        
        	}
        ... preapre collectors and finally search
        

        This is because the user can create its own query using some kind of wizard. So I have two options:

        • Support this feature
        • Avoid configurations with this kind of cases.

        It is not a really big problem it is mostly about how to keep my code as neat as I can . I can't ask you to solve my code issues

        Show
        Rodrigo Vega added a comment - Yes, I know that but i need a generic way to handle the facet requests during search preparation stage. For that reason I have a FacetParams object with the following properties: int limit, String[] path, boolean filter Then I can have some peace of code like this: // Query is a bean that wraps a lucene query just to make easy the integration with our UI. REal lucene query is // handle by "q" if (query.isFaceted()) for (FacetParams facet : query.getFacet()) { CategoryPath c = new CategoryPath(facet.getPath()); facetSearchParams.addFacetRequest( new CountFacetRequest(c, facet.getLimit())); if (facet.isFilter()) q = DrillDown.query(q, c); } ... preapre collectors and finally search This is because the user can create its own query using some kind of wizard. So I have two options: Support this feature Avoid configurations with this kind of cases. It is not a really big problem it is mostly about how to keep my code as neat as I can . I can't ask you to solve my code issues
        Hide
        Gilad Barkai added a comment -

        Well solve you code issues no But the current code is indeed broken by the issue you raised - and that should be fixed.
        I re examined the code, and I think the different hashcode you presented will work - though please note it will consume some extra CPU, as the same request will be handled twice (that's the heap to figure out the top-k of the request) to create separate FacetResults for each request.

        Show
        Gilad Barkai added a comment - Well solve you code issues no But the current code is indeed broken by the issue you raised - and that should be fixed. I re examined the code, and I think the different hashcode you presented will work - though please note it will consume some extra CPU, as the same request will be handled twice (that's the heap to figure out the top-k of the request) to create separate FacetResults for each request.
        Hide
        Rodrigo Vega added a comment -

        I though there is any kind of cache to avoid doing same work twice or at least some part of it. However I think that is a different issue/feature/discussion. Thanks for your time.

        Show
        Rodrigo Vega added a comment - I though there is any kind of cache to avoid doing same work twice or at least some part of it. However I think that is a different issue/feature/discussion. Thanks for your time.
        Hide
        Gilad Barkai added a comment -

        Proposed fix - In StandardFacetsAccumulator, guard against handling and merging the same request more than once.
        Also a matching test is introduced, inspired by previous patch.

        Thanks Rodrigo!

        Show
        Gilad Barkai added a comment - Proposed fix - In StandardFacetsAccumulator , guard against handling and merging the same request more than once. Also a matching test is introduced, inspired by previous patch. Thanks Rodrigo!
        Hide
        Shai Erera added a comment -

        Thanks for the fix Gilad. I did the following:

        • Fixed a couple of typos.
        • Updated the patch following the recent changes to FacetSearchParams (no addRequest anymore).
        • Made the test assert that the results.toString() match, rather than just numValidDecendants.
        • Added a CHANGES entry.

        Committed to trunk and 4x.

        Show
        Shai Erera added a comment - Thanks for the fix Gilad. I did the following: Fixed a couple of typos. Updated the patch following the recent changes to FacetSearchParams (no addRequest anymore). Made the test assert that the results.toString() match, rather than just numValidDecendants. Added a CHANGES entry. Committed to trunk and 4x.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1422671

        LUCENE-4461: adding same FacetRequest more than once yielded inconsistent results

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1422671 LUCENE-4461 : adding same FacetRequest more than once yielded inconsistent results
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1422670

        LUCENE-4461: adding same FacetRequest more than once yielded inconsistent results

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1422670 LUCENE-4461 : adding same FacetRequest more than once yielded inconsistent results

          People

          • Assignee:
            Shai Erera
            Reporter:
            Rodrigo Vega
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development