Lucene - Core
  1. Lucene - Core
  2. LUCENE-4893

Facet counts in FacetsAccumulator.facetArrays are multiplied as many times as FacetsCollector.getFacetResults is called.

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.2
    • Fix Version/s: 4.3, 5.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      In lucene 4.1, only StandardFacetsAccumulator could be instantiated.
      And as of lucene 4.2, it became possible to instantiate FacetsAccumulator.

      I invoked FacetsCollector.getFacetResults twice, and I saw doubled facet counts.
      If I invoke it three times, I see facet counts multiplied three times.
      It all happens in FacetsAccumulator.accumulate.

      StandardFacetsAccumulator doesn't have this bug since it frees facetArrays whenever StandardFacetsAccumulator.accumulate is called.

      Is it a feature or a bug?

      1. LUCENE-4893.patch
        6 kB
        Shai Erera
      2. LUCENE-4893.patch
        7 kB
        Shai Erera
      3. LUCENE-4893.patch
        10 kB
        Shai Erera
      4. LUCENE-4893.patch
        7 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        I don't think it's a bug. FacetsCollector is stateless in the sense that it doesn't *remember* if it computed the results already or not. Why do you call it multiple times? Usually the usage is to compute FacetResult once, then use it wherever you need it. We could make FacetsCollector behave e.g. like TopDocsCollector, where after you call topDocs() once, it drains its PQ and the subsequent calls fail (or returns an empty TopDocs). In either case, the Collector doesn't remember the returned result. Therefore I suggest that you modify your code to call it only once.

        Show
        Shai Erera added a comment - I don't think it's a bug. FacetsCollector is stateless in the sense that it doesn't *remember* if it computed the results already or not. Why do you call it multiple times? Usually the usage is to compute FacetResult once, then use it wherever you need it. We could make FacetsCollector behave e.g. like TopDocsCollector, where after you call topDocs() once, it drains its PQ and the subsequent calls fail (or returns an empty TopDocs). In either case, the Collector doesn't remember the returned result. Therefore I suggest that you modify your code to call it only once.
        Hide
        Michael McCandless added a comment -

        I think we should throw an exception if you call it again? It's trappy that it silently returns double-counts I think.

        Show
        Michael McCandless added a comment - I think we should throw an exception if you call it again? It's trappy that it silently returns double-counts I think.
        Hide
        crocket added a comment - - edited

        I assumed FacetsCollector.getFacetResults was a simple getter method, so I didn't optimize my code to call it once.

        After discovering LUCENE-4893, I modified my code to call it once.

        It may be a good idea to throw an exception or return an empty list if one calls it again.
        There will be only so many lucene beginners with no knowledge of its side effects.

        Show
        crocket added a comment - - edited I assumed FacetsCollector.getFacetResults was a simple getter method, so I didn't optimize my code to call it once. After discovering LUCENE-4893 , I modified my code to call it once. It may be a good idea to throw an exception or return an empty list if one calls it again. There will be only so many lucene beginners with no knowledge of its side effects.
        Hide
        Shai Erera added a comment -

        I didn't want to make FacetsCollector remember the FacetResult it returned, but I agree it should be consistent with other collectors. I'll check what TopDocsCollector does - exception or empty result, and fix FC accordingly.

        Show
        Shai Erera added a comment - I didn't want to make FacetsCollector remember the FacetResult it returned, but I agree it should be consistent with other collectors. I'll check what TopDocsCollector does - exception or empty result, and fix FC accordingly.
        Hide
        Shai Erera added a comment -

        Patch addresses the following:

        • FacetsCollector.getFacetResults() clears the matchingDocs list after accumulator.compute returned.
        • FacetsAccumulator creates an empty FacetResult for each FacetRequest if matchingDocs.isEmpty(). I had to do it because FacetArrays still contained the original counts, and so calling .getFacetResults twice yielded the same results as before. But this is inconsistent with how StandardFacetsAccumulator works, so I preferre\ed to return an empty FacetResult in both cases.
        • Added a test case to TestFacetsCollector.

        I think it's ready.

        Show
        Shai Erera added a comment - Patch addresses the following: FacetsCollector.getFacetResults() clears the matchingDocs list after accumulator.compute returned. FacetsAccumulator creates an empty FacetResult for each FacetRequest if matchingDocs.isEmpty(). I had to do it because FacetArrays still contained the original counts, and so calling .getFacetResults twice yielded the same results as before. But this is inconsistent with how StandardFacetsAccumulator works, so I preferre\ed to return an empty FacetResult in both cases. Added a test case to TestFacetsCollector. I think it's ready.
        Hide
        Shai Erera added a comment -

        I chose not to throw an exception because TopDocsCollector returns an empty TopDocs if called twice. Also, it's harder to distinguish a truly empty result (b/c e.g. no results were found) to being called twice.

        Show
        Shai Erera added a comment - I chose not to throw an exception because TopDocsCollector returns an empty TopDocs if called twice. Also, it's harder to distinguish a truly empty result (b/c e.g. no results were found) to being called twice.
        Hide
        crocket added a comment -

        LUCENE-4893.patch has some typos in comments.

        Show
        crocket added a comment - LUCENE-4893 .patch has some typos in comments.
        Hide
        Shai Erera added a comment -

        Thanks crocket. I found a typo in the test's comment, so if you meant another one, please specify which file has the typo. I also improved FacetsCollector.getFacetResults documentation.

        Show
        Shai Erera added a comment - Thanks crocket. I found a typo in the test's comment, so if you meant another one, please specify which file has the typo. I also improved FacetsCollector.getFacetResults documentation.
        Hide
        Michael McCandless added a comment -

        I chose not to throw an exception because TopDocsCollector returns an empty TopDocs if called twice.

        Actually I think this is bad for TopDocsCollector to do: it's trappy. I think users don't hit this because typically it's IndexSearcher.search that calls this method and returns the TopDocs.

        I'd rather fix both of these classes to throw an exception if you call their "getter" methods more than once, than silently pretending the 2nd time there were no results?

        Show
        Michael McCandless added a comment - I chose not to throw an exception because TopDocsCollector returns an empty TopDocs if called twice. Actually I think this is bad for TopDocsCollector to do: it's trappy. I think users don't hit this because typically it's IndexSearcher.search that calls this method and returns the TopDocs. I'd rather fix both of these classes to throw an exception if you call their "getter" methods more than once, than silently pretending the 2nd time there were no results?
        Hide
        Shai Erera added a comment -

        That would mean FacetsCollector will need to track whether getFacetResults was called or not, and distinguish that from "no results were found". I guess it can be done by having matchingDocs set to null by getFacetResults(), and initialized in setNextReader, so getFacetResults can check if matchingDocs is null, and throw IllegalStateException indicating no search has been performed yet (since or not the last call to getFacetResults). TopDocsCollector can be fixed like that too, but I prefer in a separate issue.

        Show
        Shai Erera added a comment - That would mean FacetsCollector will need to track whether getFacetResults was called or not, and distinguish that from "no results were found". I guess it can be done by having matchingDocs set to null by getFacetResults(), and initialized in setNextReader, so getFacetResults can check if matchingDocs is null, and throw IllegalStateException indicating no search has been performed yet (since or not the last call to getFacetResults). TopDocsCollector can be fixed like that too, but I prefer in a separate issue.
        Hide
        Shai Erera added a comment -

        I still didn't fix jdocs, this patch throws IllegalStateException if getFacetResults is called more than once, or no search was executed. But this gets TestDrillSideways.testBasic to fail, because DrillSideways (line 168) assumes it can call getFacetResult() even if the scorer it got was null.

        I wonder what's the best course of action - track in FacetsCollector only the case where getFacetResult was called more than once, or simply caching the List<FacetResult> and return it in .get() if it isn't null. An exception now seems too obtrusive to me ...

        Show
        Shai Erera added a comment - I still didn't fix jdocs, this patch throws IllegalStateException if getFacetResults is called more than once, or no search was executed. But this gets TestDrillSideways.testBasic to fail, because DrillSideways (line 168) assumes it can call getFacetResult() even if the scorer it got was null. I wonder what's the best course of action - track in FacetsCollector only the case where getFacetResult was called more than once, or simply caching the List<FacetResult> and return it in .get() if it isn't null. An exception now seems too obtrusive to me ...
        Hide
        Michael McCandless added a comment -

        I think caching the result (so .getXXX acts like a normal getter) is good?

        Show
        Michael McCandless added a comment - I think caching the result (so .getXXX acts like a normal getter) is good?
        Hide
        Shai Erera added a comment -

        Patch makes FacetsCollector cache the facet results, so .get is now a getter. reset() clears the cached results. Added additional test for reset().

        Show
        Shai Erera added a comment - Patch makes FacetsCollector cache the facet results, so .get is now a getter. reset() clears the cached results. Added additional test for reset().
        Hide
        Michael McCandless added a comment -

        +1, thanks Shai!

        Show
        Michael McCandless added a comment - +1, thanks Shai!
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. I added defensive code to prevent an app tripping itself, if it called getFacetResults before doing search, without calling reset. setNextReader now clears the cached results.

        Thanks crocket for reporting this!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. I added defensive code to prevent an app tripping itself, if it called getFacetResults before doing search, without calling reset. setNextReader now clears the cached results. Thanks crocket for reporting this!
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Shai Erera
            Reporter:
            crocket
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development