Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5025

Allow more than 2.1B "tail nodes" when building FST

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.4, 6.0
    • core/FSTs
    • None
    • New

    Description

      We recently relaxed some of the limits for big FSTs, but there is
      one more limit I think we should fix. E.g. Aaron hit it in building
      the world's biggest FST: http://aaron.blog.archive.org/2013/05/29/worlds-biggest-fst/

      The issue is NodeHash, which currently uses a GrowableWriter (packed
      ints impl that can grow both number of bits and number of values):
      it's indexed by int not long.

      This is a hash table that's used to share suffixes, so we need random
      get/put on a long index of long values, i.e. this is logically a long[].

      I think one simple way to do this is to make a "paged"
      GrowableWriter...

      Along with this we'd need to fix the hash codes to be long not
      int.

      Attachments

        1. LUCENE-5025.patch
          19 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: