Lucene - Core
  1. Lucene - Core
  2. LUCENE-4615

Remove Int/FloatArrayAllocator from facet module?

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Spinoff from LUCENE-4600.

      It makes me nervous to have allocation tied to our public APIs ... and the ability for Int/FloatArrayAllocator to hold onto N arrays indefinitely makes me even more nervous. I think we should just trust java/GC to do their job here and free the storage as soon as faceting is done.

        Activity

        Hide
        Shai Erera added a comment -

        This was required when we worked with a team that managed taxonomies with 5M+ nodes. Allocating those arrays (could be 20MB+) over and over for every search was expensive, and I think that it's expensive with today's JVMs too.

        I prefer that we keep these somewhere, and would like to propose the following:

        • Don't make the allocators so visible on the API. I.e. today StandardFacetsAccumulator takes them in the constructor. Perhaps we can move it to a protected method?
          • Then, only extensions would be exposed to it, and whoever extends SFA, or writes his own Accumulator, IMO should be allowed to reuse arrays.
        • By default, don't cache arrays. Today by default we reuse one array, and I think that for the common case, this is not needed.

        So I do like to keep the option for extensions to reuse arrays, just because we've seen that it's needed in some cases. But I am +1 for 'hiding' that option somewhere, so that only experts that need it, will find it .

        Show
        Shai Erera added a comment - This was required when we worked with a team that managed taxonomies with 5M+ nodes. Allocating those arrays (could be 20MB+) over and over for every search was expensive, and I think that it's expensive with today's JVMs too. I prefer that we keep these somewhere, and would like to propose the following: Don't make the allocators so visible on the API. I.e. today StandardFacetsAccumulator takes them in the constructor. Perhaps we can move it to a protected method? Then, only extensions would be exposed to it, and whoever extends SFA, or writes his own Accumulator, IMO should be allowed to reuse arrays. By default, don't cache arrays. Today by default we reuse one array, and I think that for the common case, this is not needed. So I do like to keep the option for extensions to reuse arrays, just because we've seen that it's needed in some cases. But I am +1 for 'hiding' that option somewhere, so that only experts that need it, will find it .
        Hide
        Shai Erera added a comment -

        Took a look in the code, the allocators are used mostly by FacetArrays, so perhaps that can be the abstraction level? I.e. we make FacetArrays expose API similar to acquire/release of int[]/float[]. The default impl would always allocate new, and discard in release. A ReusingFacetArrays would use the current allocators logic?

        Show
        Shai Erera added a comment - Took a look in the code, the allocators are used mostly by FacetArrays, so perhaps that can be the abstraction level? I.e. we make FacetArrays expose API similar to acquire/release of int[]/float[]. The default impl would always allocate new, and discard in release. A ReusingFacetArrays would use the current allocators logic?
        Hide
        Michael McCandless added a comment -

        This was required when we worked with a team that managed taxonomies with 5M+ nodes. Allocating those arrays (could be 20MB+) over and over for every search was expensive, and I think that it's expensive with today's JVMs too.

        OK. Spooky! I wonder if for such cases we should use native int[] hash map ... until the number of unique ords collected is "big enough" to warrant the non-sparse array.

        +1 for moving this abstraction to FacetArrays and making an optional/expert/not-the-default ReusingFacetArrays.

        Show
        Michael McCandless added a comment - This was required when we worked with a team that managed taxonomies with 5M+ nodes. Allocating those arrays (could be 20MB+) over and over for every search was expensive, and I think that it's expensive with today's JVMs too. OK. Spooky! I wonder if for such cases we should use native int[] hash map ... until the number of unique ords collected is "big enough" to warrant the non-sparse array. +1 for moving this abstraction to FacetArrays and making an optional/expert/not-the-default ReusingFacetArrays.
        Hide
        Shai Erera added a comment -

        Patch replaces Int/FloatArrayAllocator by ArraysPool and introduces a new ReusingFacetArrays. StandardFacetsAccumulator takes FacetArrays (or none) instead of the allocators.

        Thanks Gilad for helping to write this !

        All tests pass. I think it's ready to commit

        Show
        Shai Erera added a comment - Patch replaces Int/FloatArrayAllocator by ArraysPool and introduces a new ReusingFacetArrays. StandardFacetsAccumulator takes FacetArrays (or none) instead of the allocators. Thanks Gilad for helping to write this ! All tests pass. I think it's ready to commit
        Hide
        Michael McCandless added a comment -

        +1

        Wow, that was fast Thanks guys!

        Show
        Michael McCandless added a comment - +1 Wow, that was fast Thanks guys!
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1420159

        LUCENE-4615: do not reuse facet aggregation arrays by default; added ReusingFacetArrays

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1420159 LUCENE-4615 : do not reuse facet aggregation arrays by default; added ReusingFacetArrays
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1420162

        LUCENE-4615: ReusingFacetArrays was left out by mistake

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1420162 LUCENE-4615 : ReusingFacetArrays was left out by mistake
        Hide
        Shai Erera added a comment -

        Thanks for the review Mike. Committed to trunk and 4x.

        Show
        Shai Erera added a comment - Thanks for the review Mike. Committed to trunk and 4x.
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1420163

        LUCENE-4615: do not reuse facet aggregation arrays by default; added ReusingFacetArrays

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1420163 LUCENE-4615 : do not reuse facet aggregation arrays by default; added ReusingFacetArrays

          People

          • Assignee:
            Shai Erera
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development