Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-3932

Improve load time of .tii files

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.5
    • 4.0-ALPHA
    • None
    • None
    • Linux

    • New

    Description

      We have a large 50 gig index which is optimized as one segment, with a 66 MEG .tii file. This index has no norms, and no field cache.

      It takes about 5 seconds to load this index, profiling reveals that 60% of the time is spent in GrowableWriter.set(index, value), and most of time in set(...) is spent resizing PackedInts.Mutatable current.

      In the constructor for TermInfosReaderIndex, you initialize the writer with the line,

      GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, false);

      For our index using four as the bit estimate results in 27 resizes.

      The last value in indexToTerms is going to be ~ tiiFileLength, and if instead you use,

      int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / Math.log10(2));
      GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, false);

      Load time improves to ~ 2 seconds.

      Attachments

        1. LUCENE-3932.trunk.patch
          5 kB
          Michael McCandless
        2. perf.csv
          4 kB
          Sean Bridges

        Activity

          People

            mikemccand Michael McCandless
            sgbridges Sean Bridges
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: