Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 6.0
    • Component/s: modules/facet
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The residue is the count of all categories that did not make it to the top K. But, this is a senseless statistic. Take for example the following case: two documents with categories [A/1, A/2, A/3] and [A/1, A/4, A/5]. If you ask for top-1 category of "A", you'll get A (count=2), A/1 (count=2), but A's residue will be 4!

      As a user, that number doesn't tell you anything, except maybe when you index only one category per document for a given dimension. But in that case, the residue is root.value - sum(topK.value), which the application can compute if it needs to.

      In short, we're just wasting CPU cycles for that statistic, so I'm going to remove it.

        Activity

        Hide
        thetaphi Uwe Schindler added a comment -

        Closed after release.

        Show
        thetaphi Uwe Schindler added a comment - Closed after release.
        Hide
        commit-tag-bot Commit Tag Bot added a comment -

        [branch_4x commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1437350

        LUCENE-4709: remove FacetResultNode.residue

        Show
        commit-tag-bot Commit Tag Bot added a comment - [branch_4x commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1437350 LUCENE-4709 : remove FacetResultNode.residue
        Hide
        shaie Shai Erera added a comment -

        Committed to trunk and 4x

        Show
        shaie Shai Erera added a comment - Committed to trunk and 4x
        Hide
        commit-tag-bot Commit Tag Bot added a comment -

        [trunk commit] Shai Erera
        http://svn.apache.org/viewvc?view=revision&revision=1437345

        LUCENE-4709: remove FacetResultNode.residue

        Show
        commit-tag-bot Commit Tag Bot added a comment - [trunk commit] Shai Erera http://svn.apache.org/viewvc?view=revision&revision=1437345 LUCENE-4709 : remove FacetResultNode.residue
        Hide
        shaie Shai Erera added a comment -

        Patch removes residue. I'll commit it shortly.

        Show
        shaie Shai Erera added a comment - Patch removes residue. I'll commit it shortly.
        Hide
        shaie Shai Erera added a comment -

        BTW, a somewhat supporting evidence that we should nuke it, are the following benchmark results (thanks Mike!). Base is trunk, comp is trunk + no residue computation:

                            Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                         Respell      111.64      (3.2%)      110.49      (3.2%)   -1.0% (  -7% -    5%)
                      OrHighHigh        4.33      (2.8%)        4.30      (3.0%)   -0.7% (  -6% -    5%)
                    HighSpanNear        2.98      (2.3%)        2.97      (2.0%)   -0.4% (  -4% -    3%)
                HighSloppyPhrase        0.89      (8.9%)        0.89      (8.2%)   -0.3% ( -15% -   18%)
                        HighTerm        7.95      (2.3%)        7.93      (2.4%)   -0.2% (  -4% -    4%)
                       OrHighLow        7.57      (2.2%)        7.55      (2.3%)   -0.2% (  -4% -    4%)
                       OrHighMed        7.51      (2.7%)        7.51      (2.8%)    0.1% (  -5% -    5%)
                        Wildcard       74.46      (3.6%)       74.54      (2.0%)    0.1% (  -5% -    5%)
                        PKLookup      247.56      (2.1%)      247.85      (2.8%)    0.1% (  -4% -    5%)
                     LowSpanNear        7.54      (4.6%)        7.59      (3.6%)    0.7% (  -7% -    9%)
                     AndHighHigh       12.56      (0.9%)       12.68      (1.0%)    0.9% (  -1% -    2%)
                     MedSpanNear       19.88      (1.5%)       20.08      (2.2%)    1.0% (  -2% -    4%)
                 MedSloppyPhrase       18.45      (2.1%)       18.64      (2.1%)    1.0% (  -3% -    5%)
                 LowSloppyPhrase       17.52      (3.7%)       17.71      (3.8%)    1.1% (  -6% -    8%)
                         Prefix3       45.70      (5.6%)       46.25      (2.7%)    1.2% (  -6% -   10%)
                       LowPhrase       16.86      (3.4%)       17.07      (3.1%)    1.2% (  -5% -    8%)
                         MedTerm       23.00      (1.4%)       23.33      (1.8%)    1.4% (  -1% -    4%)
                          IntNRQ       17.97      (7.8%)       18.26      (4.7%)    1.6% ( -10% -   15%)
                      HighPhrase       15.71      (7.0%)       15.98      (5.2%)    1.7% (  -9% -   15%)
                          Fuzzy1       33.30      (1.8%)       33.90      (1.3%)    1.8% (  -1% -    5%)
                          Fuzzy2       41.46      (2.2%)       42.26      (2.0%)    1.9% (  -2% -    6%)
                         LowTerm       40.47      (1.1%)       41.45      (1.7%)    2.4% (   0% -    5%)
                      AndHighMed       49.38      (0.9%)       51.08      (1.3%)    3.4% (   1% -    5%)
                       MedPhrase       55.65      (2.7%)       57.79      (2.5%)    3.8% (  -1% -    9%)
                      AndHighLow       98.02      (1.5%)      104.36      (2.9%)    6.5% (   2% -   10%)
        
        Show
        shaie Shai Erera added a comment - BTW, a somewhat supporting evidence that we should nuke it, are the following benchmark results (thanks Mike!). Base is trunk, comp is trunk + no residue computation: Task QPS base StdDev QPS comp StdDev Pct diff Respell 111.64 (3.2%) 110.49 (3.2%) -1.0% ( -7% - 5%) OrHighHigh 4.33 (2.8%) 4.30 (3.0%) -0.7% ( -6% - 5%) HighSpanNear 2.98 (2.3%) 2.97 (2.0%) -0.4% ( -4% - 3%) HighSloppyPhrase 0.89 (8.9%) 0.89 (8.2%) -0.3% ( -15% - 18%) HighTerm 7.95 (2.3%) 7.93 (2.4%) -0.2% ( -4% - 4%) OrHighLow 7.57 (2.2%) 7.55 (2.3%) -0.2% ( -4% - 4%) OrHighMed 7.51 (2.7%) 7.51 (2.8%) 0.1% ( -5% - 5%) Wildcard 74.46 (3.6%) 74.54 (2.0%) 0.1% ( -5% - 5%) PKLookup 247.56 (2.1%) 247.85 (2.8%) 0.1% ( -4% - 5%) LowSpanNear 7.54 (4.6%) 7.59 (3.6%) 0.7% ( -7% - 9%) AndHighHigh 12.56 (0.9%) 12.68 (1.0%) 0.9% ( -1% - 2%) MedSpanNear 19.88 (1.5%) 20.08 (2.2%) 1.0% ( -2% - 4%) MedSloppyPhrase 18.45 (2.1%) 18.64 (2.1%) 1.0% ( -3% - 5%) LowSloppyPhrase 17.52 (3.7%) 17.71 (3.8%) 1.1% ( -6% - 8%) Prefix3 45.70 (5.6%) 46.25 (2.7%) 1.2% ( -6% - 10%) LowPhrase 16.86 (3.4%) 17.07 (3.1%) 1.2% ( -5% - 8%) MedTerm 23.00 (1.4%) 23.33 (1.8%) 1.4% ( -1% - 4%) IntNRQ 17.97 (7.8%) 18.26 (4.7%) 1.6% ( -10% - 15%) HighPhrase 15.71 (7.0%) 15.98 (5.2%) 1.7% ( -9% - 15%) Fuzzy1 33.30 (1.8%) 33.90 (1.3%) 1.8% ( -1% - 5%) Fuzzy2 41.46 (2.2%) 42.26 (2.0%) 1.9% ( -2% - 6%) LowTerm 40.47 (1.1%) 41.45 (1.7%) 2.4% ( 0% - 5%) AndHighMed 49.38 (0.9%) 51.08 (1.3%) 3.4% ( 1% - 5%) MedPhrase 55.65 (2.7%) 57.79 (2.5%) 3.8% ( -1% - 9%) AndHighLow 98.02 (1.5%) 104.36 (2.9%) 6.5% ( 2% - 10%)

          People

          • Assignee:
            shaie Shai Erera
            Reporter:
            shaie Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development