Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.4, 5.0, 6.0
    • Component/s: core/FSTs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The purpose of the root arc cache is to speed up lookups for ASCII terms, but it adds high overhead if the FST is already tiny.

      1. LUCENE-6105.patch
        11 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch.

        I fixed FST to only cache if the added RAM required is < 20% of the size of the FST, and if the number of root arcs is >= FIXED_ARRAY_NUM_ARCS_SHALLOW (5), like array'd arcs.

        I also simplified how we assert for "caller illegally changed cached root arc", and added a test for that.

        Show
        Michael McCandless added a comment - Patch. I fixed FST to only cache if the added RAM required is < 20% of the size of the FST, and if the number of root arcs is >= FIXED_ARRAY_NUM_ARCS_SHALLOW (5), like array'd arcs. I also simplified how we assert for "caller illegally changed cached root arc", and added a test for that.
        Hide
        Robert Muir added a comment -

        This change looks great!

        I am just worried about 4.10.3, it seems like bad timing to make a last minute FST change. Can we be conservative here? I would prefer we make this change normally and if its appropriate, finds its way to 4.10.4. Bug fix releases are supposed to be easy.

        Show
        Robert Muir added a comment - This change looks great! I am just worried about 4.10.3, it seems like bad timing to make a last minute FST change. Can we be conservative here? I would prefer we make this change normally and if its appropriate, finds its way to 4.10.4. Bug fix releases are supposed to be easy.
        Hide
        Michael McCandless added a comment -

        Thanks Rob.

        I'll remove the 4.10.x fix version for now ... I agree there is some risk. We can let Jenkins chew on it in trunk/5.x, and see if there are any performance issues in the nightly performance benchmarks (http://people.apache.org/~mikemccand/lucenebench/).

        I do think it's a bug though: an FST that would otherwise take ~200 bytes can take ~4 KB instead, because of this cache.

        Show
        Michael McCandless added a comment - Thanks Rob. I'll remove the 4.10.x fix version for now ... I agree there is some risk. We can let Jenkins chew on it in trunk/5.x, and see if there are any performance issues in the nightly performance benchmarks ( http://people.apache.org/~mikemccand/lucenebench/ ). I do think it's a bug though: an FST that would otherwise take ~200 bytes can take ~4 KB instead, because of this cache.
        Hide
        Robert Muir added a comment -

        I don't disagree with the bug aspect, and i am sure the patch is great.

        We use these FSTs in lots of code so the possibility of a performance bug (separate from correctness issue) exists. We should at least wait for a few iterations of jenkins, ideally some nightly benchmark reports, given how widespread FST usage is, before getting it out to users. This stuff happens automatically with time, but just needs some time.

        Show
        Robert Muir added a comment - I don't disagree with the bug aspect, and i am sure the patch is great. We use these FSTs in lots of code so the possibility of a performance bug (separate from correctness issue) exists. We should at least wait for a few iterations of jenkins, ideally some nightly benchmark reports, given how widespread FST usage is, before getting it out to users. This stuff happens automatically with time, but just needs some time.
        Hide
        ASF subversion and git services added a comment -

        Commit 1644473 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1644473 ]

        LUCENE-6105: don't cache root arcs for small FSTs

        Show
        ASF subversion and git services added a comment - Commit 1644473 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1644473 ] LUCENE-6105 : don't cache root arcs for small FSTs
        Hide
        ASF subversion and git services added a comment -

        Commit 1644478 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1644478 ]

        LUCENE-6105: don't cache root arcs for small FSTs

        Show
        ASF subversion and git services added a comment - Commit 1644478 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1644478 ] LUCENE-6105 : don't cache root arcs for small FSTs
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.
        Hide
        Michael McCandless added a comment -

        Reopen for 4.10.4

        Show
        Michael McCandless added a comment - Reopen for 4.10.4
        Hide
        ASF subversion and git services added a comment -

        Commit 1662196 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1662196 ]

        LUCENE-6105: don't cache root arcs for small FSTs

        Show
        ASF subversion and git services added a comment - Commit 1662196 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1662196 ] LUCENE-6105 : don't cache root arcs for small FSTs
        Hide
        Michael McCandless added a comment -

        Bulk close for 4.10.4 release

        Show
        Michael McCandless added a comment - Bulk close for 4.10.4 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development