Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-6199, pulling out just the FST RAM reduction changes.

      The FST data structure tries to be a RAM efficient representation of a sorted map, but there are a few things I think we can do to trim it even more:

      • Don't store arc and node count: this is available from the Builder if you really want to do something with it.
      • Don't use the "paged" byte store unless the FST is huge; just use a single byte[]
      • Some members like lastFrozenNode, reusedBytesPerArc, allowArrayArcs are only used during building, so we should move them to the Builder
      • We don't need to cache NO_OUTPUT: we can ask the Outputs impl for it
      1. LUCENE-6617.patch
        35 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial patch, pulled out from LUCENE-6199, but javadocs aren't passing ...

        Show
        Michael McCandless added a comment - Initial patch, pulled out from LUCENE-6199 , but javadocs aren't passing ...
        Hide
        Robert Muir added a comment -

        This seems good to me.

        Show
        Robert Muir added a comment - This seems good to me.
        Hide
        Dawid Weiss added a comment -

        Err... how much is this going to save? Seems like pennies to me compared to what the actual data does?

        Show
        Dawid Weiss added a comment - Err... how much is this going to save? Seems like pennies to me compared to what the actual data does?
        Hide
        Michael McCandless added a comment -

        Err... how much is this going to save? Seems like pennies to me compared to what the actual data does?

        You're right, it's just a constant byte reduction on the starting size of an FST, but for tiny FSTs, if you have many of them, this can add up.

        FST aims to be a very memory efficient data structure so I don't think we should waste bytes if we don't need to ...

        Show
        Michael McCandless added a comment - Err... how much is this going to save? Seems like pennies to me compared to what the actual data does? You're right, it's just a constant byte reduction on the starting size of an FST, but for tiny FSTs, if you have many of them, this can add up. FST aims to be a very memory efficient data structure so I don't think we should waste bytes if we don't need to ...
        Hide
        ASF subversion and git services added a comment -

        Commit 1688412 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1688412 ]

        LUCENE-6617: reduce heap usage for small FSTs

        Show
        ASF subversion and git services added a comment - Commit 1688412 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1688412 ] LUCENE-6617 : reduce heap usage for small FSTs
        Hide
        ASF subversion and git services added a comment -

        Commit 1688419 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1688419 ]

        LUCENE-6617: reduce heap usage for small FSTs

        Show
        ASF subversion and git services added a comment - Commit 1688419 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1688419 ] LUCENE-6617 : reduce heap usage for small FSTs
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development