Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The residue is the count of all categories that did not make it to the top K. But, this is a senseless statistic. Take for example the following case: two documents with categories [A/1, A/2, A/3] and [A/1, A/4, A/5]. If you ask for top-1 category of "A", you'll get A (count=2), A/1 (count=2), but A's residue will be 4!

      As a user, that number doesn't tell you anything, except maybe when you index only one category per document for a given dimension. But in that case, the residue is root.value - sum(topK.value), which the application can compute if it needs to.

      In short, we're just wasting CPU cycles for that statistic, so I'm going to remove it.

        Activity

        Hide
        Shai Erera added a comment -

        BTW, a somewhat supporting evidence that we should nuke it, are the following benchmark results (thanks Mike!). Base is trunk, comp is trunk + no residue computation:

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                         Respell      111.64      (3.2%)      110.49      (3.2%)   -1.0% (  -7% -    5%)
                      OrHighHigh        4.33      (2.8%)        4.30      (3.0%)   -0.7% (  -6% -    5%)
                    HighSpanNear        2.98      (2.3%)        2.97      (2.0%)   -0.4% (  -4% -    3%)
                HighSloppyPhrase        0.89      (8.9%)        0.89      (8.2%)   -0.3% ( -15% -   18%)
                        HighTerm        7.95      (2.3%)        7.93      (2.4%)   -0.2% (  -4% -    4%)
                       OrHighLow        7.57      (2.2%)        7.55      (2.3%)   -0.2% (  -4% -    4%)
                       OrHighMed        7.51      (2.7%)        7.51      (2.8%)    0.1% (  -5% -    5%)
                        Wildcard       74.46      (3.6%)       74.54      (2.0%)    0.1% (  -5% -    5%)
                        PKLookup      247.56      (2.1%)      247.85      (2.8%)    0.1% (  -4% -    5%)
                     LowSpanNear        7.54      (4.6%)        7.59      (3.6%)    0.7% (  -7% -    9%)
                     AndHighHigh       12.56      (0.9%)       12.68      (1.0%)    0.9% (  -1% -    2%)
                     MedSpanNear       19.88      (1.5%)       20.08      (2.2%)    1.0% (  -2% -    4%)
                 MedSloppyPhrase       18.45      (2.1%)       18.64      (2.1%)    1.0% (  -3% -    5%)
                 LowSloppyPhrase       17.52      (3.7%)       17.71      (3.8%)    1.1% (  -6% -    8%)
                         Prefix3       45.70      (5.6%)       46.25      (2.7%)    1.2% (  -6% -   10%)
                       LowPhrase       16.86      (3.4%)       17.07      (3.1%)    1.2% (  -5% -    8%)
                         MedTerm       23.00      (1.4%)       23.33      (1.8%)    1.4% (  -1% -    4%)
                          IntNRQ       17.97      (7.8%)       18.26      (4.7%)    1.6% ( -10% -   15%)
                      HighPhrase       15.71      (7.0%)       15.98      (5.2%)    1.7% (  -9% -   15%)
                          Fuzzy1       33.30      (1.8%)       33.90      (1.3%)    1.8% (  -1% -    5%)
                          Fuzzy2       41.46      (2.2%)       42.26      (2.0%)    1.9% (  -2% -    6%)
                         LowTerm       40.47      (1.1%)       41.45      (1.7%)    2.4% (   0% -    5%)
                      AndHighMed       49.38      (0.9%)       51.08      (1.3%)    3.4% (   1% -    5%)
                       MedPhrase       55.65      (2.7%)       57.79      (2.5%)    3.8% (  -1% -    9%)
                      AndHighLow       98.02      (1.5%)      104.36      (2.9%)    6.5% (   2% -   10%)
        
        Show
        Shai Erera added a comment - BTW, a somewhat supporting evidence that we should nuke it, are the following benchmark results (thanks Mike!). Base is trunk, comp is trunk + no residue computation: Task QPS base StdDev QPS comp StdDev Pct diff Respell 111.64 (3.2%) 110.49 (3.2%) -1.0% ( -7% - 5%) OrHighHigh 4.33 (2.8%) 4.30 (3.0%) -0.7% ( -6% - 5%) HighSpanNear 2.98 (2.3%) 2.97 (2.0%) -0.4% ( -4% - 3%) HighSloppyPhrase 0.89 (8.9%) 0.89 (8.2%) -0.3% ( -15% - 18%) HighTerm 7.95 (2.3%) 7.93 (2.4%) -0.2% ( -4% - 4%) OrHighLow 7.57 (2.2%) 7.55 (2.3%) -0.2% ( -4% - 4%) OrHighMed 7.51 (2.7%) 7.51 (2.8%) 0.1% ( -5% - 5%) Wildcard 74.46 (3.6%) 74.54 (2.0%) 0.1% ( -5% - 5%) PKLookup 247.56 (2.1%) 247.85 (2.8%) 0.1% ( -4% - 5%) LowSpanNear 7.54 (4.6%) 7.59 (3.6%) 0.7% ( -7% - 9%) AndHighHigh 12.56 (0.9%) 12.68 (1.0%) 0.9% ( -1% - 2%) MedSpanNear 19.88 (1.5%) 20.08 (2.2%) 1.0% ( -2% - 4%) MedSloppyPhrase 18.45 (2.1%) 18.64 (2.1%) 1.0% ( -3% - 5%) LowSloppyPhrase 17.52 (3.7%) 17.71 (3.8%) 1.1% ( -6% - 8%) Prefix3 45.70 (5.6%) 46.25 (2.7%) 1.2% ( -6% - 10%) LowPhrase 16.86 (3.4%) 17.07 (3.1%) 1.2% ( -5% - 8%) MedTerm 23.00 (1.4%) 23.33 (1.8%) 1.4% ( -1% - 4%) IntNRQ 17.97 (7.8%) 18.26 (4.7%) 1.6% ( -10% - 15%) HighPhrase 15.71 (7.0%) 15.98 (5.2%) 1.7% ( -9% - 15%) Fuzzy1 33.30 (1.8%) 33.90 (1.3%) 1.8% ( -1% - 5%) Fuzzy2 41.46 (2.2%) 42.26 (2.0%) 1.9% ( -2% - 6%) LowTerm 40.47 (1.1%) 41.45 (1.7%) 2.4% ( 0% - 5%) AndHighMed 49.38 (0.9%) 51.08 (1.3%) 3.4% ( 1% - 5%) MedPhrase 55.65 (2.7%) 57.79 (2.5%) 3.8% ( -1% - 9%) AndHighLow 98.02 (1.5%) 104.36 (2.9%) 6.5% ( 2% - 10%)
        Hide
        Shai Erera added a comment -

        Patch removes residue. I'll commit it shortly.

        Show
        Shai Erera added a comment - Patch removes residue. I'll commit it shortly.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1437345

        LUCENE-4709: remove FacetResultNode.residue

        Show
        Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1437345 LUCENE-4709 : remove FacetResultNode.residue
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x

        Show
        Shai Erera added a comment - Committed to trunk and 4x
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1437350

        LUCENE-4709: remove FacetResultNode.residue

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1437350 LUCENE-4709 : remove FacetResultNode.residue
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development