Lucene - Core
  1. Lucene - Core
  2. LUCENE-4234

Exception when FacetsCollector is used with ScoreFacetRequest

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.6, 4.0-ALPHA
    • Fix Version/s: 4.0-BETA, 3.6.2, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Aggregating facets with Lucene's Score using FacetsCollector results in an Exception (assertion when enabled).

      1. LUCENE-4234.patch
        8 kB
        Shai Erera
      2. LUCENE-4234.patch
        6 kB
        Gilad Barkai

        Activity

        Hide
        Gilad Barkai added a comment -

        A test revealing the bug.

        Show
        Gilad Barkai added a comment - A test revealing the bug.
        Hide
        Gilad Barkai added a comment -

        Proposed fix, the code was optimized for a capacity of 1000 documents, but BitSet was initizlized with the same value of 1000, causing .fastSet() to fail.

        Initializing the scores array to size of 64, and initializing the bitset to .maxDoc() as in the non-score-keeping version.

        Show
        Gilad Barkai added a comment - Proposed fix, the code was optimized for a capacity of 1000 documents, but BitSet was initizlized with the same value of 1000, causing .fastSet() to fail. Initializing the scores array to size of 64, and initializing the bitset to .maxDoc() as in the non-score-keeping version.
        Hide
        Shai Erera added a comment -

        Good catch !

        I have few comments about the patch:

        1. Can you merge them into one file?
        2. Can we use FixedBitSet? It's an improvement over OpenBitSet, without needing to use fastXYZ
        3. I think that we can initialize the scores array to maxDoc, and drop the resizing in collect – collect is a hot method, we should do minimal work there. If someone want to keep scores for maxDoc documents, we should initialize the array up front.
        Show
        Shai Erera added a comment - Good catch ! I have few comments about the patch: Can you merge them into one file? Can we use FixedBitSet? It's an improvement over OpenBitSet, without needing to use fastXYZ I think that we can initialize the scores array to maxDoc, and drop the resizing in collect – collect is a hot method, we should do minimal work there. If someone want to keep scores for maxDoc documents, we should initialize the array up front.
        Hide
        Gilad Barkai added a comment -

        One patch (to rule them all) instead of the two. No changes were made thus far.

        Show
        Gilad Barkai added a comment - One patch (to rule them all) instead of the two. No changes were made thus far.
        Hide
        Gilad Barkai added a comment -

        Hi Shai, thank you for reviewing.

        1. Merged
        2. Using FBS is possible, though it's a larger change than the fix, perhaps handle this in a seperate issue?
        3. IMHO that's a waste for a large index. While allocating a large array is faster, it takes a lot of memory - probably for no reason. Each query will take longer, but concurrent queries will not hurt the GC that much. If the index is large concurrent queries might hit the GC hard, perhaps even OOM ? I must admit I never measured the reallocation penalty.

        Show
        Gilad Barkai added a comment - Hi Shai, thank you for reviewing. 1. Merged 2. Using FBS is possible, though it's a larger change than the fix, perhaps handle this in a seperate issue? 3. IMHO that's a waste for a large index. While allocating a large array is faster, it takes a lot of memory - probably for no reason. Each query will take longer, but concurrent queries will not hurt the GC that much. If the index is large concurrent queries might hit the GC hard, perhaps even OOM ? I must admit I never measured the reallocation penalty.
        Hide
        Shai Erera added a comment -

        Ok I understand the code better now – the scores array keeps scores only for docs that were visited by Scorer. Therefore it doesn't need to be of size maxDoc, but rather close to how many docs were visited.

        Makes sense I guess for queries that hit a low percentage of the index. I'm not worried about the re-allocation work at all ... I didn't dig deep into the code first, and thought that we're keeping an entry per document in the index, whether it's scored or not.

        The current code is good then.

        I'll review the new patch now.

        Show
        Shai Erera added a comment - Ok I understand the code better now – the scores array keeps scores only for docs that were visited by Scorer. Therefore it doesn't need to be of size maxDoc, but rather close to how many docs were visited. Makes sense I guess for queries that hit a low percentage of the index. I'm not worried about the re-allocation work at all ... I didn't dig deep into the code first, and thought that we're keeping an entry per document in the index, whether it's scored or not. The current code is good then. I'll review the new patch now.
        Hide
        Shai Erera added a comment -

        Updated patch file:

        • Use FixedBitSet - the change was trivial
        • Add a comment to new float[64] explaining why we do it (I know it got me confused once ...)
        • Added CHANGES entry

        All tests pass. I'll commit it by tomorrow.

        Show
        Shai Erera added a comment - Updated patch file: Use FixedBitSet - the change was trivial Add a comment to new float [64] explaining why we do it (I know it got me confused once ...) Added CHANGES entry All tests pass. I'll commit it by tomorrow.
        Hide
        Gilad Barkai added a comment -

        Thank you for looking into this and making the appropriate changes.
        Patch looks good.

        Show
        Gilad Barkai added a comment - Thank you for looking into this and making the appropriate changes. Patch looks good.
        Hide
        Shai Erera added a comment -

        Committed to trunk rev 1364576. Porting now to 4x an 3.6.2.

        Show
        Shai Erera added a comment - Committed to trunk rev 1364576. Porting now to 4x an 3.6.2.
        Hide
        Shai Erera added a comment -

        Committed r1364582 (4x) and 1364586 (3.6.2).

        Thanks Gilad !

        Show
        Shai Erera added a comment - Committed r1364582 (4x) and 1364586 (3.6.2). Thanks Gilad !
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Shai Erera
            Reporter:
            Gilad Barkai
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development