Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.5
    • Fix Version/s: 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Environment:

      Linux

    • Lucene Fields:
      New

      Description

      We have a large 50 gig index which is optimized as one segment, with a 66 MEG .tii file. This index has no norms, and no field cache.

      It takes about 5 seconds to load this index, profiling reveals that 60% of the time is spent in GrowableWriter.set(index, value), and most of time in set(...) is spent resizing PackedInts.Mutatable current.

      In the constructor for TermInfosReaderIndex, you initialize the writer with the line,

      GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, false);

      For our index using four as the bit estimate results in 27 resizes.

      The last value in indexToTerms is going to be ~ tiiFileLength, and if instead you use,

      int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / Math.log10(2));
      GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, false);

      Load time improves to ~ 2 seconds.

      1. LUCENE-3932.trunk.patch
        5 kB
        Michael McCandless
      2. perf.csv
        4 kB
        Sean Bridges

        Activity

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Sean Bridges
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development