Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5025

Allow more than 2.1B "tail nodes" when building FST

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4, 6.0
    • Component/s: core/FSTs
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We recently relaxed some of the limits for big FSTs, but there is
      one more limit I think we should fix. E.g. Aaron hit it in building
      the world's biggest FST: http://aaron.blog.archive.org/2013/05/29/worlds-biggest-fst/

      The issue is NodeHash, which currently uses a GrowableWriter (packed
      ints impl that can grow both number of bits and number of values):
      it's indexed by int not long.

      This is a hash table that's used to share suffixes, so we need random
      get/put on a long index of long values, i.e. this is logically a long[].

      I think one simple way to do this is to make a "paged"
      GrowableWriter...

      Along with this we'd need to fix the hash codes to be long not
      int.

        Attachments

        1. LUCENE-5025.patch
          19 kB
          Michael McCandless

          Activity

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: