Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.3
    • Fix Version/s: 4.4, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      When sampling FacetResults, the TopKInEachNodeHandler is used to get the FacetResults.

      This is my case:
      A FacetResult is returned (which matches a FacetRequest) from the StandardFacetAccumulator. The facet has 0 results. The labelling of the root-node seems incorrect. I know, from the StandardFacetAccumulator, that the rootnode has a label, so I can use that one.

      Currently the recursivelyLabel method uses the taxonomyReader.getPath() to retrieve the label. I think we can skip that for the rootNode when there are no children (and gain a little performance on the way too?)

      1. LUCENE-5016.patch
        11 kB
        Shai Erera
      2. test-labels.zip
        4 kB
        Rob Audenaerde

        Activity

        Hide
        Shai Erera added a comment -

        Can you attach a simple testcase exposing the problem? Not sure that I follow what's wrong. About not labeling, I doubt it will gain us anything. Labeling is not very expensive, and labels are LRU-cached. Also, considering all the work that's done during search processing, the labeling part is less than marginal.

        Show
        Shai Erera added a comment - Can you attach a simple testcase exposing the problem? Not sure that I follow what's wrong. About not labeling, I doubt it will gain us anything. Labeling is not very expensive, and labels are LRU-cached. Also, considering all the work that's done during search processing, the labeling part is less than marginal.
        Hide
        Rob Audenaerde added a comment -

        Now that I wrote the tests, I realise that maybe the behaviour of the StandardFacetAccumulator is incorrect, as it labels a FacetResult of a Facet that does not exist in the taxonomy...

        The behaviour of the SamplingAccumulator and the Standard differ.

        For my use case, it is very helpful if all the FacetRequests return a FacetResult with the same label as the request, but I can imagine that this is not desired.

        Show
        Rob Audenaerde added a comment - Now that I wrote the tests, I realise that maybe the behaviour of the StandardFacetAccumulator is incorrect, as it labels a FacetResult of a Facet that does not exist in the taxonomy... The behaviour of the SamplingAccumulator and the Standard differ. For my use case, it is very helpful if all the FacetRequests return a FacetResult with the same label as the request, but I can imagine that this is not desired.
        Hide
        Shai Erera added a comment -

        I am not near the code and actually read the test in Notepad . It looks like you're indexing 100K docs with categories A/docnum and then ask to count the categories "A" and "B". If I understand correctly, the assert in the end fails?

        Basically, the FacetRestult that you get back should have the same label as the request. If it's not like that (and I will validate that when I'm near the code), then it's probably a bug in SamplingAccumulator.

        BTW the test actually indexed 200K docs while passing 'j' which is 0 for the first 100K and 1 for the second. But 'j' seems to be unused in addDocument. This shouldn't affect the test behavior but just FYI.

        Thanks for reporting this, I'll take a deeper look later.

        Show
        Shai Erera added a comment - I am not near the code and actually read the test in Notepad . It looks like you're indexing 100K docs with categories A/docnum and then ask to count the categories "A" and "B". If I understand correctly, the assert in the end fails? Basically, the FacetRestult that you get back should have the same label as the request. If it's not like that (and I will validate that when I'm near the code), then it's probably a bug in SamplingAccumulator. BTW the test actually indexed 200K docs while passing 'j' which is 0 for the first 100K and 1 for the second. But 'j' seems to be unused in addDocument. This shouldn't affect the test behavior but just FYI. Thanks for reporting this, I'll take a deeper look later.
        Hide
        Shai Erera added a comment -

        I checked that and indeed there is inconsistency here. StandardFacetsAccumulator and FacetsAccumulator return an empty result with the root node labeled, while the sampling accumulators return the root node not labeled. There isn't anything technically wrong here, because the category does not exist, but I think we should be consistent.

        I was able to reproduce this behavior with an even simpler test Rob: index a single document with category "A" and ask to count category "B". The problem is as follows:

        • SamplingAccumulator delegates to SFA.
        • SFA detects this category does not exist and creates an empty FacetResult, which sets the label of the root node to the request's CategoryPath.
        • SamplingAccumulator receives the results, and potentially runs SampleFixer. Then it labels the result, which then sets the label to null, after not finding it in the taxonomy.

        Perhaps at some point of the code lifecycle this additional labeling was needed, I'm not sure . But I think we should either remove the call to label the results in SamplingAccumulator, or at least not call taxoReader.getPath if the node.label is not null. For instance, if you ask to count "A" (which does exist), then labeling happens twice, once by SFA.accumulate and second time by SamplingAccumulator, which is just a waste.

        I'll attach later a short test which reproduces this and checks all existing accumulators.

        Show
        Shai Erera added a comment - I checked that and indeed there is inconsistency here. StandardFacetsAccumulator and FacetsAccumulator return an empty result with the root node labeled, while the sampling accumulators return the root node not labeled. There isn't anything technically wrong here, because the category does not exist, but I think we should be consistent. I was able to reproduce this behavior with an even simpler test Rob: index a single document with category "A" and ask to count category "B". The problem is as follows: SamplingAccumulator delegates to SFA. SFA detects this category does not exist and creates an empty FacetResult, which sets the label of the root node to the request's CategoryPath. SamplingAccumulator receives the results, and potentially runs SampleFixer. Then it labels the result, which then sets the label to null, after not finding it in the taxonomy. Perhaps at some point of the code lifecycle this additional labeling was needed, I'm not sure . But I think we should either remove the call to label the results in SamplingAccumulator, or at least not call taxoReader.getPath if the node.label is not null. For instance, if you ask to count "A" (which does exist), then labeling happens twice, once by SFA.accumulate and second time by SamplingAccumulator, which is just a waste. I'll attach later a short test which reproduces this and checks all existing accumulators.
        Hide
        Shai Erera added a comment -

        Patch fixes the following:

        • OverSampledFacetRequest sets numLabel=0, as otherwise it will label categories that will be thrown away (over-sample). This was introduced incorrectly in LUCENE-4411.
        • SamplingAccumulator and SamplingWrapper now return an emptyResult() when the ordinal of the root node is INVALID, which makes them consistent with the other accumulators.
        • Added TestFacetsCollector.testLabeling which ensures all current accumulators behave the same.
          • This uncovered a bug in RangeAccumulator when some readers did not have the requested numeric DV field. I fixed that too.

        I think it's ready - I intend to commit it tomorrow.

        Show
        Shai Erera added a comment - Patch fixes the following: OverSampledFacetRequest sets numLabel=0, as otherwise it will label categories that will be thrown away (over-sample). This was introduced incorrectly in LUCENE-4411 . SamplingAccumulator and SamplingWrapper now return an emptyResult() when the ordinal of the root node is INVALID, which makes them consistent with the other accumulators. Added TestFacetsCollector.testLabeling which ensures all current accumulators behave the same. This uncovered a bug in RangeAccumulator when some readers did not have the requested numeric DV field. I fixed that too. I think it's ready - I intend to commit it tomorrow.
        Hide
        Gilad Barkai added a comment -

        Patch looks good.
        +1 for commit

        Show
        Gilad Barkai added a comment - Patch looks good. +1 for commit
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Rob for reporting this!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Rob for reporting this!
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            Shai Erera
            Reporter:
            Rob Audenaerde
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development