Lucene - Core
  1. Lucene - Core
  2. LUCENE-388

[PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      Operating System: Mac OS X 10.3
      Platform: Macintosh

      Description

      Note: I believe this to be the same situation with 1.4.3 as with SVN HEAD.

      Analysis using hprof utility shows that during index creation with many
      documents highlights that the CPU spends a large portion of it's time in
      IndexWriter.maybeMergeSegments(), which seems to be a 'waste' compared with
      other valuable CPU intensive operations such as tokenization etc.

      Using the following test snippet to retrieve some rows from the db and create an
      index:

      Analyzer a = new StandardAnalyzer();
      writer = new IndexWriter(indexDir, a, true);
      writer.setMergeFactor(1000);
      writer.setMaxBufferedDocs(10000);
      writer.setUseCompoundFile(false);
      connection = DriverManager.getConnection(
      "jdbc:inetdae7:tower.aconex.com?database=<somedb>", "secret",
      "squirrel");
      String sql = "select userid, userfirstname, userlastname, email from userx";
      LOG.info("sql=" + sql);
      Statement statement = connection.createStatement();
      statement.setFetchSize(5000);
      LOG.info("Executing sql");
      ResultSet rs = statement.executeQuery(sql);
      LOG.info("ResultSet retrieved");
      int row = 0;

      LOG.info("Indexing users");
      long begin = System.currentTimeMillis();
      while (rs.next()) {
      int userid = rs.getInt(1);
      String firstname = rs.getString(2);
      String lastname = rs.getString(3);
      String email = rs.getString(4);
      String fullName = firstname + " " + lastname;
      Document doc = new Document();
      doc.add(Field.Keyword("userid", userid+""));
      doc.add(Field.Keyword("firstname", firstname.toLowerCase()));
      doc.add(Field.Keyword("lastname", lastname.toLowerCase()));
      doc.add(Field.Text("name", fullName.toLowerCase()));
      doc.add(Field.Keyword("email", email.toLowerCase()));
      writer.addDocument(doc);
      row++;
      if((row % 100)==0)

      { LOG.info(row + " indexed"); }

      }
      double end = System.currentTimeMillis();
      double diff = (end-begin)/1000;
      double rate = row/diff;
      LOG.info("rate:" +rate);

      On my 1.5GHz PowerBook with 1.5Gb RAM and a 5400 RPM drive, my CPU is maxed out,
      and I end up getting a rate of indexing between 490-515 documents/second run
      over 10 times in succession.

      By applying a simple patch to IndexWriter (see attached shortly), which defers
      the calling of maybeMergeSegments() so that it is only called every 2000
      times(an arbitrary figure), I appear to get a new rate of between 945-970
      documents/second. Using Luke to look inside each index created between these 2
      there does not appear to be any difference. Same number of Documents, same
      number of Terms.

      I'm not suggesting one should apply this patch, I'm just highlighting the
      difference in performance that this sort of change gives you.

      We are about to use Lucene to index 4 million construction document records, and
      so speeding up the indexing process is in our best interest! If one
      considers the amount of CPU time spent in maybeMergeSegments over the initial
      index creation of 4 million documents, I think one could see how it would be
      ideal to try to speed this area up (at least move the bottleneck to IO).

      I woul appreciate anyone taking a moment to comment on this.

      1. ASF.LICENSE.NOT.GRANTED--IndexWriter.patch
        0.8 kB
        Paul Smith
      2. ASF.LICENSE.NOT.GRANTED--log.optimized.deep.txt
        454 kB
        Paul Smith
      3. ASF.LICENSE.NOT.GRANTED--log.optimized.txt
        183 kB
        Paul Smith
      4. ASF.LICENSE.NOT.GRANTED--log-compound.txt
        136 kB
        Paul Smith
      5. ASF.LICENSE.NOT.GRANTED--lucene.34930.patch
        2 kB
        Paul Smith
      6. ASF.LICENSE.NOT.GRANTED--Lucene Performance Test - with & without hack.xls
        36 kB
        Paul Smith
      7. doron_2_IndexWriter.patch
        1.0 kB
        Doron Cohen
      8. doron_2b_IndexWriter.patch
        1 kB
        Doron Cohen
      9. doron_IndexWriter.patch
        2 kB
        Doron Cohen
      10. yonik_indexwriter.diff
        3 kB
        Yonik Seeley
      11. yonik_indexwriter.diff
        3 kB
        Yonik Seeley

        Activity

        Hide
        Yonik Seeley added a comment -

        No problem... I probably should have put "LUCENE-388" in the commit log (it allows JIRA to link up the related commits automatically)

        Show
        Yonik Seeley added a comment - No problem... I probably should have put " LUCENE-388 " in the commit log (it allows JIRA to link up the related commits automatically)
        Hide
        Doron Cohen added a comment -

        Oh - sorry for the 'noise' - got used to systems where a commit must be attached to an issue/defect - should have checked the commits list or the code itself - I'm learning all the time...

        Show
        Doron Cohen added a comment - Oh - sorry for the 'noise' - got used to systems where a commit must be attached to an issue/defect - should have checked the commits list or the code itself - I'm learning all the time...
        Hide
        Yonik Seeley added a comment -

        No need to re-open Doron, I committed doron_2_IndexWriter.patch at the same time as my first reply (shortly after you posted it).

        Show
        Yonik Seeley added a comment - No need to re-open Doron, I committed doron_2_IndexWriter.patch at the same time as my first reply (shortly after you posted it).
        Hide
        Doron Cohen added a comment -

        Paul, would you like to re-open this issue for (re)solving it with one of the two recent patches (2 or 2b) - I think that once an issue is resolved it can be re-opened (or closed) by the developer who opened it. (I assume commiters can also re-open, but I'm not sure - well, at least I can't.)
        Thanks, Doron

        Show
        Doron Cohen added a comment - Paul, would you like to re-open this issue for (re)solving it with one of the two recent patches (2 or 2b) - I think that once an issue is resolved it can be re-opened (or closed) by the developer who opened it. (I assume commiters can also re-open, but I'm not sure - well, at least I can't.) Thanks, Doron
        Hide
        Doron Cohen added a comment -

        Right... actually it should be like this:

        int minSegment = segmentInfos.size() - singleDocSegmentsCount - 1;

        But since flushRamSegments() is only called by close() and optimize(), no real performance gain is expected here.

        So I'm not sure what my preference is between -
        (a) do not to change here, because "why change a working code to be perhaps a bit more complex for no performance gain".
        (b) change here too, also to be consistent with how this counter is used in maybeMergeSegments().

        Anyway I tested this change and it works - so I am attaching also this version - doron_2b_IndexWriter.patch - in case there is a favor for (b).

        • Doron
        Show
        Doron Cohen added a comment - Right... actually it should be like this: int minSegment = segmentInfos.size() - singleDocSegmentsCount - 1; But since flushRamSegments() is only called by close() and optimize(), no real performance gain is expected here. So I'm not sure what my preference is between - (a) do not to change here, because "why change a working code to be perhaps a bit more complex for no performance gain". (b) change here too, also to be consistent with how this counter is used in maybeMergeSegments(). Anyway I tested this change and it works - so I am attaching also this version - doron_2b_IndexWriter.patch - in case there is a favor for (b). Doron
        Hide
        Yonik Seeley added a comment -

        We could also make the following change to flushRamSegments, right?

        private final void flushRamSegments() throws IOException {
        int minSegment = segmentInfos.size() - singleDocSegmentsCount;
        int docCount = singleDocSegmentsCount;

        Show
        Yonik Seeley added a comment - We could also make the following change to flushRamSegments, right? private final void flushRamSegments() throws IOException { int minSegment = segmentInfos.size() - singleDocSegmentsCount; int docCount = singleDocSegmentsCount;
        Hide
        Yonik Seeley added a comment -

        Thanks Doron, I caught that too and I was just going to set the count to 0 in mergeSegments (mergeSegments is always called with end == size() currently I think). Your fix is better though - gives more flexibility.

        Show
        Yonik Seeley added a comment - Thanks Doron, I caught that too and I was just going to set the count to 0 in mergeSegments (mergeSegments is always called with end == size() currently I think). Your fix is better though - gives more flexibility.
        Hide
        Doron Cohen added a comment -

        The attached doron_2_IndexWriter.patch is fixing the updating of singleDocSegmentsCount to take place in mergeSegments(minSegment, end) so that it would apply also when optimize() is called. The update of the counter now considers the range of the merge (so the counter is not necessarily updated to 0).

        The bug in previous implementation was occasionally using TestIndexModifier.testIndexWithThreads() to fail with ArrayIndexOutOfBoundsException on the segments array.

        I ran this test several times with this fix and now it consistently passes. "ant test" passes as well.

        I hope we're done with this bug...

        Doron

        Show
        Doron Cohen added a comment - The attached doron_2_IndexWriter.patch is fixing the updating of singleDocSegmentsCount to take place in mergeSegments(minSegment, end) so that it would apply also when optimize() is called. The update of the counter now considers the range of the merge (so the counter is not necessarily updated to 0). The bug in previous implementation was occasionally using TestIndexModifier.testIndexWithThreads() to fail with ArrayIndexOutOfBoundsException on the segments array. I ran this test several times with this fix and now it consistently passes. "ant test" passes as well. I hope we're done with this bug... Doron
        Hide
        Doron Cohen added a comment -

        well.... there is a problem in the current patch after all... the counter is not decremented when a merge is triggerred by a call to optimize. It is only decremented when a merge is called from maybeMergeSegments(). Therefore the counter decrement should move to mergeSegments(int,int). I am testing this fix now and will add a new patch for this soon.
        Doron

        Show
        Doron Cohen added a comment - well.... there is a problem in the current patch after all... the counter is not decremented when a merge is triggerred by a call to optimize. It is only decremented when a merge is called from maybeMergeSegments(). Therefore the counter decrement should move to mergeSegments(int,int). I am testing this fix now and will add a new patch for this soon. Doron
        Hide
        Yonik Seeley added a comment -

        I've committed Doron's version after review and further testing with my testInvariants() (also committed, but commented out). Thanks Doron & Paul!

        Show
        Yonik Seeley added a comment - I've committed Doron's version after review and further testing with my testInvariants() (also committed, but commented out). Thanks Doron & Paul!
        Hide
        Yonik Seeley added a comment -

        I was literally a minute away from committing my version when Doron sumbitted his
        Actually, I think I like Doron's "singleDocSegmentsCount" better.... it's easier to understand at a glance.

        I was testing the performance for mine... not as much of a speeup as I would have liked...
        5 to 6% better with maxBufferedDocs=1000, and a trivial single field document.
        You need to go to maxBufferedDocs=10000 to see a good speedup, and that's probably not advisable for most real indicies (and the maxBufferedDocs=1000 used much less memory and was slightly faster anyway).

        Here is the code I added to IndexWriter to test my version (add testInvariants() after add() call and after flushRamSegments() in close(), then do "ant test")

        private synchronized void testInvariants() {
        // index segments should decrease in size
        int maxSegLevel = 0;
        for (int i=segmentInfos.size()1; i>=0; i-) {
        SegmentInfo si = segmentInfos.info;
        int segLevel = (si.docCount)/minMergeDocs;
        if (segLevel < maxSegLevel)

        { throw new RuntimeException("Segment #" + i + " is too small. " + segInfo()); }

        maxSegLevel = Math.max(maxSegLevel,segLevel);
        }

        // check if merges needed
        long targetMergeDocs = minMergeDocs;
        int minSegment = segmentInfos.size();

        while (targetMergeDocs <= maxMergeDocs && minSegment>=0) {
        int mergeDocs = 0;
        while (--minSegment >= 0)

        { SegmentInfo si = segmentInfos.info(minSegment); if (si.docCount >= targetMergeDocs) break; mergeDocs += si.docCount; }

        if (mergeDocs >= targetMergeDocs)

        { throw new RuntimeException("Merge needed at level "+targetMergeDocs + " :"+segInfo()); }

        targetMergeDocs *= mergeFactor; // increase target size
        }
        }

        private String segInfo() {
        StringBuffer sb = new StringBuffer("minMergeDocs="minMergeDocs"docsLeftBeforeMerge="docsLeftBeforeMerge" segsizes:");
        for (int i=0; i<segmentInfos.size(); i++)

        { sb.append(segmentInfos.info(i).docCount); sb.append(","); }

        return sb.toString();
        }

        Show
        Yonik Seeley added a comment - I was literally a minute away from committing my version when Doron sumbitted his Actually, I think I like Doron's "singleDocSegmentsCount" better.... it's easier to understand at a glance. I was testing the performance for mine... not as much of a speeup as I would have liked... 5 to 6% better with maxBufferedDocs=1000, and a trivial single field document. You need to go to maxBufferedDocs=10000 to see a good speedup, and that's probably not advisable for most real indicies (and the maxBufferedDocs=1000 used much less memory and was slightly faster anyway). Here is the code I added to IndexWriter to test my version (add testInvariants() after add() call and after flushRamSegments() in close(), then do "ant test") private synchronized void testInvariants() { // index segments should decrease in size int maxSegLevel = 0; for (int i=segmentInfos.size() 1; i>=0; i -) { SegmentInfo si = segmentInfos.info ; int segLevel = (si.docCount)/minMergeDocs; if (segLevel < maxSegLevel) { throw new RuntimeException("Segment #" + i + " is too small. " + segInfo()); } maxSegLevel = Math.max(maxSegLevel,segLevel); } // check if merges needed long targetMergeDocs = minMergeDocs; int minSegment = segmentInfos.size(); while (targetMergeDocs <= maxMergeDocs && minSegment>=0) { int mergeDocs = 0; while (--minSegment >= 0) { SegmentInfo si = segmentInfos.info(minSegment); if (si.docCount >= targetMergeDocs) break; mergeDocs += si.docCount; } if (mergeDocs >= targetMergeDocs) { throw new RuntimeException("Merge needed at level "+targetMergeDocs + " :"+segInfo()); } targetMergeDocs *= mergeFactor; // increase target size } } private String segInfo() { StringBuffer sb = new StringBuffer("minMergeDocs=" minMergeDocs "docsLeftBeforeMerge=" docsLeftBeforeMerge " segsizes:"); for (int i=0; i<segmentInfos.size(); i++) { sb.append(segmentInfos.info(i).docCount); sb.append(","); } return sb.toString(); }
        Hide
        Doron Cohen added a comment -

        It seems that the excessive cpu usage is mainly for (re)scanning those single-doc segments at the top of the "stack".

        The following simple patch (small modification to previous one) only keeps track of the number of single doc segments, and when iterating for merge candidate segments, takes all those single doc segments in one step.

        All tests pass with this patch, running a bit faster (3 seconds - can be just noise). A "tight mem setting" (2048 cacm files with -Xmx3m and setMaxBufferedDocs(50) ) that fails with previous patch passes with this one. I checked the trace for which segments were merged - same merge decisions as before this change for these 2048 filess.

        Show
        Doron Cohen added a comment - It seems that the excessive cpu usage is mainly for (re)scanning those single-doc segments at the top of the "stack". The following simple patch (small modification to previous one) only keeps track of the number of single doc segments, and when iterating for merge candidate segments, takes all those single doc segments in one step. All tests pass with this patch, running a bit faster (3 seconds - can be just noise). A "tight mem setting" (2048 cacm files with -Xmx3m and setMaxBufferedDocs(50) ) that fails with previous patch passes with this one. I checked the trace for which segments were merged - same merge decisions as before this change for these 2048 filess.
        Hide
        Yonik Seeley added a comment -

        Got it!

        Attatching new version with a minor change... I forgot to take into account deleted documents. After a merge, instead of setting the count to minMergeDocs, we set it to 0, forcing a recount on the next add. The reason is that deleted docs could have been squeezed out during the merge ,leaving a smaller segment that might be used in another merge shortly.

        Example of original behavior (minMergeDocs=10):
        // segment sizes = 55,8,1
        add doc
        // segment sizes = 55,8,1,1
        maybe merge
        // segment sizes = 55,9 (not 10 because of a previously deleted doc)
        add doc
        // segment sizes = 55,9,1
        maybe merge
        // segment sizes = 55,10

        Now, the original behavior isn't necessarily desirable, but we are taking baby steps... first "do no harm". A separate patch can address merging behavior in the presence of deleted documents.

        Show
        Yonik Seeley added a comment - Got it! Attatching new version with a minor change... I forgot to take into account deleted documents. After a merge, instead of setting the count to minMergeDocs, we set it to 0, forcing a recount on the next add. The reason is that deleted docs could have been squeezed out during the merge ,leaving a smaller segment that might be used in another merge shortly. Example of original behavior (minMergeDocs=10): // segment sizes = 55,8,1 add doc // segment sizes = 55,8,1,1 maybe merge // segment sizes = 55,9 (not 10 because of a previously deleted doc) add doc // segment sizes = 55,9,1 maybe merge // segment sizes = 55,10 Now, the original behavior isn't necessarily desirable, but we are taking baby steps... first "do no harm". A separate patch can address merging behavior in the presence of deleted documents.
        Hide
        Yonik Seeley added a comment -

        Something is wrong with my patch... the behavior still differs from the original.
        I wrote a testInvariants() method, and merges are not always done when they need to be.

        Show
        Yonik Seeley added a comment - Something is wrong with my patch... the behavior still differs from the original. I wrote a testInvariants() method, and merges are not always done when they need to be.
        Hide
        DM Smith added a comment -

        In Eclipse you can set the run to use its own JVM. Otherwise it runs in Eclipse's JVM and it has negative memory side effects.

        Show
        DM Smith added a comment - In Eclipse you can set the run to use its own JVM. Otherwise it runs in Eclipse's JVM and it has negative memory side effects.
        Hide
        Daniel Naber added a comment -

        I just see that this OOM is not exactly reproducible. I would have expected that it always happens when it has happened once when indexing the same data with the same settings (though not necessarily at the same time). But that doesn't seem to be the case. I use Java 1.4.2, -Xmx2M, writer.setMaxBufferedDocs(50) and I am indexing 2200 files with the IndexFiles class from the Lucene demo package. Also, I let the code run in Eclipse, maybe this has negative side effects.

        So your code is probably okay, although I suggest you set up a similar test case just to be sure.

        Show
        Daniel Naber added a comment - I just see that this OOM is not exactly reproducible. I would have expected that it always happens when it has happened once when indexing the same data with the same settings (though not necessarily at the same time). But that doesn't seem to be the case. I use Java 1.4.2, -Xmx2M, writer.setMaxBufferedDocs(50) and I am indexing 2200 files with the IndexFiles class from the Lucene demo package. Also, I let the code run in Eclipse, maybe this has negative side effects. So your code is probably okay, although I suggest you set up a similar test case just to be sure.
        Hide
        Yonik Seeley added a comment -

        Well, the patch changes what the hotspot is, so the hotspot compiler could be trying to compile different code...

        Daniel, what is the exact test you are running? what JVM and other settings?
        Perhaps trying both versions with -Xbatch or -Xint would remove the differences?

        Perhaps we need a method to determine if index "invariants" have been met...

        Show
        Yonik Seeley added a comment - Well, the patch changes what the hotspot is, so the hotspot compiler could be trying to compile different code... Daniel, what is the exact test you are running? what JVM and other settings? Perhaps trying both versions with -Xbatch or -Xint would remove the differences? Perhaps we need a method to determine if index "invariants" have been met...
        Hide
        Paul Smith added a comment -

        This is where some tracing logging code would be useful. Maybe a YourKit memory snapshot to see what's going on.. ? I can't see Yonik's patch should influence the memory profile. It's just delaying the check for merging until an appropriate time, and should not be removing opportunities to merge segments. I can't see why checking less often uses more memory.

        Obviously something strange is happening.

        Show
        Paul Smith added a comment - This is where some tracing logging code would be useful. Maybe a YourKit memory snapshot to see what's going on.. ? I can't see Yonik's patch should influence the memory profile. It's just delaying the check for merging until an appropriate time, and should not be removing opportunities to merge segments. I can't see why checking less often uses more memory. Obviously something strange is happening.
        Hide
        Yonik Seeley added a comment -

        > the problem is the same as before: I get an OutOfMemoryError using settings that without the patch

        That's worrisome though... the increase in code and data is minimal, and my patch should not trigger any different merging behavior. Unless the previous version was passing the low-mem test by a few bytes, something is still wrong.

        Show
        Yonik Seeley added a comment - > the problem is the same as before: I get an OutOfMemoryError using settings that without the patch That's worrisome though... the increase in code and data is minimal, and my patch should not trigger any different merging behavior. Unless the previous version was passing the low-mem test by a few bytes, something is still wrong.
        Hide
        Daniel Naber added a comment -

        Hi Yonik, I just tested the patch: sorry, but the problem is the same as before: I get an OutOfMemoryError using settings that without the patch. That doesn't mean that the patch is wrong of course, but as we're after performance improvements it wouldn't make sense to compare it to the old version which uses less memory.

        Show
        Daniel Naber added a comment - Hi Yonik, I just tested the patch: sorry, but the problem is the same as before: I get an OutOfMemoryError using settings that without the patch. That doesn't mean that the patch is wrong of course, but as we're after performance improvements it wouldn't make sense to compare it to the old version which uses less memory.
        Hide
        Yonik Seeley added a comment -

        How does this patch look?

        It's designed to exactly match the previous merging behavior, and it's conservative (an exact count is not maintained at all times).

        Show
        Yonik Seeley added a comment - How does this patch look? It's designed to exactly match the previous merging behavior, and it's conservative (an exact count is not maintained at all times).
        Hide
        Otis Gospodnetic added a comment -

        Ah, too bad. Reverted. I do recall seeing maybeMergeSegments being called a lot, although I don't recall how much actual time was spent in there.

        Show
        Otis Gospodnetic added a comment - Ah, too bad. Reverted. I do recall seeing maybeMergeSegments being called a lot, although I don't recall how much actual time was spent in there.
        Hide
        Paul Smith added a comment -

        geez, yep definitely don't put this in, my patch was only a 'suggestion' to highlight how it fixes the root cause of the problem. iIt is interesting that originally, all the test cases still pass, yet the problems Yonik highlights is real. Might warrant some extra test cases to cover exactly those situation, even if this problem is not addressed.

        Be great if this could be fixed completely though, but I haven't got any headspace left to continue research on this one.. sorry

        Show
        Paul Smith added a comment - geez, yep definitely don't put this in, my patch was only a 'suggestion' to highlight how it fixes the root cause of the problem. iIt is interesting that originally, all the test cases still pass, yet the problems Yonik highlights is real. Might warrant some extra test cases to cover exactly those situation, even if this problem is not addressed. Be great if this could be fixed completely though, but I haven't got any headspace left to continue research on this one.. sorry
        Hide
        Yonik Seeley added a comment -

        Another problem I see even for single-sessions is that bufferedDocCount is not maintained correctly... all merges are subtracted, not just those of the buffered docs. This will lead to fewer and fewer merges than expected over time.

        I definitely vote to revert this. There isn't an easy fix to this patch - a different approach is required.

        Show
        Yonik Seeley added a comment - Another problem I see even for single-sessions is that bufferedDocCount is not maintained correctly... all merges are subtracted, not just those of the buffered docs. This will lead to fewer and fewer merges than expected over time. I definitely vote to revert this. There isn't an easy fix to this patch - a different approach is required.
        Hide
        Yonik Seeley added a comment -

        One problem I see:
        If you continuously add batches of documents of size less than maxBufferedDocs, maybeMergeSegments will never trigger a merge. On a close(), it looks like there is some merging logic in flushRamSegments, but it doesn't do full multi-level merging like maybeMergeSegments does.

        I think I know how to fix it, but perhaps this should be reverted in the meantime?

        Show
        Yonik Seeley added a comment - One problem I see: If you continuously add batches of documents of size less than maxBufferedDocs, maybeMergeSegments will never trigger a merge. On a close(), it looks like there is some merging logic in flushRamSegments, but it doesn't do full multi-level merging like maybeMergeSegments does. I think I know how to fix it, but perhaps this should be reverted in the meantime?
        Hide
        Daniel Naber added a comment -

        Something is wrong with this patch (as it has been applied) as it increases memory usage. Indexing files with the IndexFiles demo worked before using writer.setMaxBufferedDocs(50) and a tight JVM memory setting (-Xmx2M), now it fails with an OutOfMemoryError.

        Show
        Daniel Naber added a comment - Something is wrong with this patch (as it has been applied) as it increases memory usage. Indexing files with the IndexFiles demo worked before using writer.setMaxBufferedDocs(50) and a tight JVM memory setting (-Xmx2M), now it fails with an OutOfMemoryError.
        Hide
        Otis Gospodnetic added a comment -

        This last patch looks good to me, and I remember looking at this issue when Paul brought it up. If I recall correctly, I saw the same wasted time in maybeMergeSegments().

        All tests pass, so I'm committing this, only 15 months after it was reported . Thanks Paul!

        Show
        Otis Gospodnetic added a comment - This last patch looks good to me, and I remember looking at this issue when Paul brought it up. If I recall correctly, I saw the same wasted time in maybeMergeSegments(). All tests pass, so I'm committing this, only 15 months after it was reported . Thanks Paul!
        Hide
        hoschek added a comment -

        Yep, throwing and catching exception in the critical path is always a performance gotcha, common case
        or not. See any VM implementation performance papers such as in the IBM Systems journal some years
        ago, and others.

        No idea why the javacc folks didn't come up with an API that does not involve exceptions for normal
        control flow. Well, javacc has probably been unmaintained dead code for some time now. [Even Xerces
        has such gotchas deep inside it's low level native API - I chatted with this some time ago with a Sun
        engineer].

        Anyway, you can preallocate the IOException in FastCharStream in a private static final var, and then
        throw the same exception object again and again on EOS. That gives some factor 2x the cheap way
        because the stack trace does not have to be generated and filled repeatadly (Same for the QueryParser
        copy of FastCharStream).

        The other additional 5x comes from getting rid of the exception completely - catching exceptions is
        expensive. This is done via dirty patching the javacc generated code to not require EOS exceptions at
        all. Instead you can return 0xFFFF as an EOS marker, or some other unused Unicode value. Look at the
        javacc generated code and see where it catches the EOS exception. That's where you'd need to fiddle
        around, making sure true abnormal exceptions are still handled properly. It's really an akward
        maintainance nightmare because it interferes with generated code, so I don't really recommend this.

        Still, StandardAnalyzer eats CPU (and buffer memory) like there's no tomorrow. Instead, I'd recommend
        giving PatternAnalyzer (from the "memory" SVN contrib area) a try. The functionality is almost the same
        as StandardAnalyzer, but it can be many times faster, especially when using it with a String rather than
        a Reader, and you don't have to wait indefinitely for lucene to get fixed.

        Show
        hoschek added a comment - Yep, throwing and catching exception in the critical path is always a performance gotcha, common case or not. See any VM implementation performance papers such as in the IBM Systems journal some years ago, and others. No idea why the javacc folks didn't come up with an API that does not involve exceptions for normal control flow. Well, javacc has probably been unmaintained dead code for some time now. [Even Xerces has such gotchas deep inside it's low level native API - I chatted with this some time ago with a Sun engineer]. Anyway, you can preallocate the IOException in FastCharStream in a private static final var, and then throw the same exception object again and again on EOS. That gives some factor 2x the cheap way because the stack trace does not have to be generated and filled repeatadly (Same for the QueryParser copy of FastCharStream). The other additional 5x comes from getting rid of the exception completely - catching exceptions is expensive. This is done via dirty patching the javacc generated code to not require EOS exceptions at all. Instead you can return 0xFFFF as an EOS marker, or some other unused Unicode value. Look at the javacc generated code and see where it catches the EOS exception. That's where you'd need to fiddle around, making sure true abnormal exceptions are still handled properly. It's really an akward maintainance nightmare because it interferes with generated code, so I don't really recommend this. Still, StandardAnalyzer eats CPU (and buffer memory) like there's no tomorrow. Instead, I'd recommend giving PatternAnalyzer (from the "memory" SVN contrib area) a try. The functionality is almost the same as StandardAnalyzer, but it can be many times faster, especially when using it with a String rather than a Reader, and you don't have to wait indefinitely for lucene to get fixed.
        Hide
        Paul Smith added a comment -

        (In reply to comment #12)
        > Most likely the IOExceptions stem from FastCharStream of StandardAnalyzer, at
        least that's my experience.
        > See the header of your hprof file to check if that's the case.
        >
        > Javacc has a gotcha in that it "likes" IOExceptions as part of the normal
        control flow on stream termination.
        > I have performance patches for this that make FastCharStream 2-10 times faster
        for small inputs (it
        > doesn't matter on large inputs), but I never bothered submitted them since
        they were not crucial for me,
        > and seeing that most submissions to lucene-dev go into a black hole anyway
        without any response,
        > neither negative or positive, simply ignored... Hint

        Hey, could you send me the patch for this? I'd like to test it out. I've
        verified the FastCharStream class does throw IOException when EOF, which of
        course occurs very frequently when indexing small Fields. I can't see a quick
        and easy solution to indicate EOF without the exception, so I'm keen to see your
        work and test it out with my scenarios.

        cheers,

        Paul Smith

        Show
        Paul Smith added a comment - (In reply to comment #12) > Most likely the IOExceptions stem from FastCharStream of StandardAnalyzer, at least that's my experience. > See the header of your hprof file to check if that's the case. > > Javacc has a gotcha in that it "likes" IOExceptions as part of the normal control flow on stream termination. > I have performance patches for this that make FastCharStream 2-10 times faster for small inputs (it > doesn't matter on large inputs), but I never bothered submitted them since they were not crucial for me, > and seeing that most submissions to lucene-dev go into a black hole anyway without any response, > neither negative or positive, simply ignored... Hint Hey, could you send me the patch for this? I'd like to test it out. I've verified the FastCharStream class does throw IOException when EOF, which of course occurs very frequently when indexing small Fields. I can't see a quick and easy solution to indicate EOF without the exception, so I'm keen to see your work and test it out with my scenarios. cheers, Paul Smith
        Hide
        Paul Smith added a comment -

        See Bug 35037 for related info

        Show
        Paul Smith added a comment - See Bug 35037 for related info
        Hide
        Paul Smith added a comment -

        Created an attachment (id=15141)
        Same HPFOF trace, patch applied, but with deeper stack trace

        Show
        Paul Smith added a comment - Created an attachment (id=15141) Same HPFOF trace, patch applied, but with deeper stack trace
        Hide
        Paul Smith added a comment -

        I expanded the depth of the HPFOF test (will attach shortly) but in summary,
        this is where it comes from:

        TRACE 300351:
        java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown line)
        java.lang.Throwable.<init>(Throwable.java:196)
        java.lang.Exception.<init>(Exception.java:41)
        java.lang.RuntimeException.<init>(RuntimeException.java:43)
        java.lang.ClassCastException.<init>(ClassCastException.java:39)
        org.apache.lucene.document.Field.readerValue(Field.java:262)

        org.apache.lucene.index.DocumentWriter.invertDocument(DocumentWriter.java:152)
        org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:93)

        Very strang I must say, I will have to look into that in more detail. Anyone
        care to comment?

        Show
        Paul Smith added a comment - I expanded the depth of the HPFOF test (will attach shortly) but in summary, this is where it comes from: TRACE 300351: java.lang.Throwable.fillInStackTrace(Throwable.java:Unknown line) java.lang.Throwable.<init>(Throwable.java:196) java.lang.Exception.<init>(Exception.java:41) java.lang.RuntimeException.<init>(RuntimeException.java:43) java.lang.ClassCastException.<init>(ClassCastException.java:39) org.apache.lucene.document.Field.readerValue(Field.java:262) org.apache.lucene.index.DocumentWriter.invertDocument(DocumentWriter.java:152) org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:93) Very strang I must say, I will have to look into that in more detail. Anyone care to comment?
        Hide
        hoschek added a comment -

        Most likely the IOExceptions stem from FastCharStream of StandardAnalyzer, at least that's my experience.
        See the header of your hprof file to check if that's the case.

        Javacc has a gotcha in that it "likes" IOExceptions as part of the normal control flow on stream termination.
        I have performance patches for this that make FastCharStream 2-10 times faster for small inputs (it
        doesn't matter on large inputs), but I never bothered submitted them since they were not crucial for me,
        and seeing that most submissions to lucene-dev go into a black hole anyway without any response,
        neither negative or positive, simply ignored... Hint

        Show
        hoschek added a comment - Most likely the IOExceptions stem from FastCharStream of StandardAnalyzer, at least that's my experience. See the header of your hprof file to check if that's the case. Javacc has a gotcha in that it "likes" IOExceptions as part of the normal control flow on stream termination. I have performance patches for this that make FastCharStream 2-10 times faster for small inputs (it doesn't matter on large inputs), but I never bothered submitted them since they were not crucial for me, and seeing that most submissions to lucene-dev go into a black hole anyway without any response, neither negative or positive, simply ignored... Hint
        Hide
        Mario Ivankovits added a comment -

        Where do the java.lang.Throwable.fillInStackTrace come from?

        Show
        Mario Ivankovits added a comment - Where do the java.lang.Throwable.fillInStackTrace come from?
        Hide
        Paul Smith added a comment -

        Created an attachment (id=15139)
        HPROF output of the same test with the patch applied

        This HPROF shows the CPU savings of the patch.

        In summary, the original was this:

        1 24.14% 24.14% 859 300342
        org.apache.lucene.index.IndexWriter.maybeMergeSegments
        2 4.83% 28.97% 172 300306 java.lang.Throwable.fillInStackTrace
        3 4.07% 33.04% 145 300336 org.apache.lucene.index.SegmentInfos.info
        4 2.50% 35.54% 89 300505
        org.apache.lucene.store.RAMInputStream.readInternal
        5 2.33% 37.88% 83 300299 java.util.Vector.addElement
        6 2.16% 40.04% 77 300334 org.apache.lucene.document.Field.readerValue
        7 2.14% 42.17% 76 300272 java.net.SocketInputStream.socketRead0
        8 1.40% 43.58% 50 300598 org.apache.lucene.util.PriorityQueue.downHeap

        9 1.15% 44.73% 41 300617 org.apache.lucene.util.PriorityQueue.downHeap

        10 1.01% 45.74% 36 300581 java.io.RandomAccessFile.writeBytes
        .....

        And after the patch is applied it becomes:

        1 5.45% 5.45% 130 300287 java.lang.Throwable.fillInStackTrace
        2 5.20% 10.65% 124 300277 java.net.SocketInputStream.socketRead0
        3 3.86% 14.51% 92 300515
        org.apache.lucene.store.RAMInputStream.readInternal
        4 3.27% 17.79% 78 300332 java.util.Vector.addElement
        5 2.35% 20.13% 56 300548 java.io.RandomAccessFile.writeBytes
        6 2.22% 22.36% 53 300305 org.apache.lucene.document.Field.readerValue
        7 1.85% 24.20% 44 300595 org.apache.lucene.util.PriorityQueue.downHeap

        8 1.72% 25.92% 41 300580 org.apache.lucene.util.PriorityQueue.downHeap

        9 1.51% 27.43% 36 300645 java.net.SocketInputStream.socketRead0
        10 1.43% 28.86% 34 300284
        org.apache.lucene.store.BufferedIndexOutput.<init>
        11 1.43% 30.29% 34 300562 java.lang.Object.clone
        12 1.17% 31.46% 28 300346 java.io.StringReader.read
        13 1.01% 32.47% 24 300363
        org.apache.lucene.index.DocumentWriter.writeNorms

        (Note: The Socket reading element is the JDBC driver retrieving the results).

        cheers,

        Paul

        Show
        Paul Smith added a comment - Created an attachment (id=15139) HPROF output of the same test with the patch applied This HPROF shows the CPU savings of the patch. In summary, the original was this: 1 24.14% 24.14% 859 300342 org.apache.lucene.index.IndexWriter.maybeMergeSegments 2 4.83% 28.97% 172 300306 java.lang.Throwable.fillInStackTrace 3 4.07% 33.04% 145 300336 org.apache.lucene.index.SegmentInfos.info 4 2.50% 35.54% 89 300505 org.apache.lucene.store.RAMInputStream.readInternal 5 2.33% 37.88% 83 300299 java.util.Vector.addElement 6 2.16% 40.04% 77 300334 org.apache.lucene.document.Field.readerValue 7 2.14% 42.17% 76 300272 java.net.SocketInputStream.socketRead0 8 1.40% 43.58% 50 300598 org.apache.lucene.util.PriorityQueue.downHeap 9 1.15% 44.73% 41 300617 org.apache.lucene.util.PriorityQueue.downHeap 10 1.01% 45.74% 36 300581 java.io.RandomAccessFile.writeBytes ..... And after the patch is applied it becomes: 1 5.45% 5.45% 130 300287 java.lang.Throwable.fillInStackTrace 2 5.20% 10.65% 124 300277 java.net.SocketInputStream.socketRead0 3 3.86% 14.51% 92 300515 org.apache.lucene.store.RAMInputStream.readInternal 4 3.27% 17.79% 78 300332 java.util.Vector.addElement 5 2.35% 20.13% 56 300548 java.io.RandomAccessFile.writeBytes 6 2.22% 22.36% 53 300305 org.apache.lucene.document.Field.readerValue 7 1.85% 24.20% 44 300595 org.apache.lucene.util.PriorityQueue.downHeap 8 1.72% 25.92% 41 300580 org.apache.lucene.util.PriorityQueue.downHeap 9 1.51% 27.43% 36 300645 java.net.SocketInputStream.socketRead0 10 1.43% 28.86% 34 300284 org.apache.lucene.store.BufferedIndexOutput.<init> 11 1.43% 30.29% 34 300562 java.lang.Object.clone 12 1.17% 31.46% 28 300346 java.io.StringReader.read 13 1.01% 32.47% 24 300363 org.apache.lucene.index.DocumentWriter.writeNorms (Note: The Socket reading element is the JDBC driver retrieving the results). cheers, Paul
        Hide
        Paul Smith added a comment -

        Created an attachment (id=15137)
        Patch that hopefully is what Doug means

        Doug,

        I could not resolve in my mind exactly what you meant in your example code
        snippet, but here's a patch that I believe accomplishes the same thing. It's
        merely a renaming of the variable, and using the 'mergeSegement decrements'
        concept that you mention which I think is a good idea.

        This patch still doesn't speed up the actual detecting of what segments on the
        stack to merge when it does go into that loop. I need to re-baseline the
        performance measurements with this patch to see where the CPU is now spending
        most of it's time, as it it may now be somewhere else entirely.

        NOTE: all current unit tests still pass with this patch.

        Show
        Paul Smith added a comment - Created an attachment (id=15137) Patch that hopefully is what Doug means Doug, I could not resolve in my mind exactly what you meant in your example code snippet, but here's a patch that I believe accomplishes the same thing. It's merely a renaming of the variable, and using the 'mergeSegement decrements' concept that you mention which I think is a good idea. This patch still doesn't speed up the actual detecting of what segments on the stack to merge when it does go into that loop. I need to re-baseline the performance measurements with this patch to see where the CPU is now spending most of it's time, as it it may now be somewhere else entirely. NOTE: all current unit tests still pass with this patch.
        Hide
        Paul Smith added a comment -

        Created an attachment (id=15053)
        Excel file containg results over small samples between Hacked and Non-hacked
        code using different maxBufferedDocs settings

        Attached Excel file with some results

        Show
        Paul Smith added a comment - Created an attachment (id=15053) Excel file containg results over small samples between Hacked and Non-hacked code using different maxBufferedDocs settings Attached Excel file with some results
        Hide
        Paul Smith added a comment -

        Doug, after more testing, you are correct that tweaking the maxBuffered from
        10,000 to 5000 gives slightly better performance.

        However I believe this is because with maxBufferedDocs==5000, this loop counter
        is reset more frequently, and so suffers less of the CPU drain.

        I ran more tests using the hacked version using both 5000 and 10000, and the
        hacked version still runs better. I'll attach an Excel sheet with the results,
        but I only had time to run it over 5 samples each run, which is not exactly
        statistically significant, but hopefully still correct.

        As you can see, deferring this loop enables larger maxBuffered to gain ahead of
        smaller maxBufferedDocs.

        As always with these tests there are transient factors that affect things
        (getting DB results for one thing).

        Show
        Paul Smith added a comment - Doug, after more testing, you are correct that tweaking the maxBuffered from 10,000 to 5000 gives slightly better performance. However I believe this is because with maxBufferedDocs==5000, this loop counter is reset more frequently, and so suffers less of the CPU drain. I ran more tests using the hacked version using both 5000 and 10000, and the hacked version still runs better. I'll attach an Excel sheet with the results, but I only had time to run it over 5 samples each run, which is not exactly statistically significant, but hopefully still correct. As you can see, deferring this loop enables larger maxBuffered to gain ahead of smaller maxBufferedDocs. As always with these tests there are transient factors that affect things (getting DB results for one thing).
        Hide
        Paul Smith added a comment -

        >> Your benchmark might run faster if you set maxBufferedDocs smaller. Also, it
        >> doesn't look like you're including the cost of closing the IndexWriter in your
        >> benchmark statistics. You should, as, with such a large buffer, you've delayed
        >> much of the work to that point.
        >>

        Yes, by not factoring in the optmize()/close() call into the rate calculation,
        there is still 'work to be done' at the end, but that would only be the tail end
        of the remaining docs stored in memory, right? When indexing millions of
        records, this is probably not going to be a large percentage of the overall
        time, as it would only, at most, the last maxBufferedDocs to be tidied up. Or
        have I confused myself? I'm still quite new to Lucene and it's inner workings.

        >> The bottleneck you're hitting is that maybeMergeDocs sums the size of the
        >> buffered indexes each time to decide whether to merge. When you have thousands
        >> buffered, this dominates.

        Yes, when maxBufferedDocs is relatively high (which is useful, I thought, if you
        have the memory to throw at the application and one is trying to stay off IO as
        much as possible, that's what I've understood anyway) the loop ends up something
        like this, where N is maxBufferedDocs

        n + (n-1)+(n-2) + ....(n-n)

        (I'm no math wiz, sorry)
        You can notice this when indexing and outputing logging information, you see the
        'rate' slow down slightly as the number of docs is added to the in memory
        buffer, then once the automatic merge is performed, the rate speeds up, then
        progressively slows down again.

        >>
        >> To optimize this case (small docs, large maxBufferedDocs) we could keep count of
        >> the number of documents buffered by adding a bufferedDocCount field.
        >> addDocument could increment this, mergeSegments could decrement it, and
        >> maybeMergeSegments could check it with something like:
        >>
        >> if (targetMergeDocs == minMergeDocs)

        { >> mergeDocs = bufferedDocCount; >> }

        else {
        >> while (--minSegment >= 0)

        { >> ... >> }

        >> }
        >>
        >> Does that make sense?

        Err, to be honest I'm not quite sure what you mean by "small docs" in the above
        first statement. I'm also a little confused on the:

        >> if (targetMergeDocs == minMergeDocs) {

        and how it relates to the bufferedDocCount you mention.

        In my hack/patch I'm effectively keeping track of the number of documents added
        as you suggest, so I believe we're pretty close to the same thing, but I blame
        only having one coffee on trying to understand it. I think I like where you
        going though, it smells right.

        Something I thought about last night is that the current code works fine for all
        cases, however the 'clean index being rebuilt' seems to get the raw end of the
        stick. When an index is being incrementally or batch updated, IndexWriter
        probably does need to scan the Segments to see if they need merging to obey the
        configuration settings. However a fresh index is a special case, and seems like
        it could be optimized (this may be what you meant by 'small docs' ?).

        In really large scale indexes/indices the number of documents being
        incrementally/batch updated is going to be totally minor compared to the cost of
        building the full index for the first time. For a true High Availability
        solution, one will always need to factor in the cost to rebuild the index from
        scratch should it come to that. Making the initial index as fast as possible
        gives much smaller down time.

        For crawling applications that use Lucene, this optimization will probably not
        even get noticed, because of the latency in retrieving the source material to
        make up the document. Only when the source material can be accessed fast will
        this optimization be important.

        Removing this CPU bottleneck has even more benefits when you consider those
        solutions using Lucene to parallel index documents. This optimization means
        that you get a multiplication of benefit the more CPU's being utilized,
        particularly if the architecture is a producer/consumer operation with a Buffer
        in between. (obviously IO starts to get in the way more). With an optimization
        the CPU can be better utilized performing tokenization etc.

        cheers,

        Paul

        Show
        Paul Smith added a comment - >> Your benchmark might run faster if you set maxBufferedDocs smaller. Also, it >> doesn't look like you're including the cost of closing the IndexWriter in your >> benchmark statistics. You should, as, with such a large buffer, you've delayed >> much of the work to that point. >> Yes, by not factoring in the optmize()/close() call into the rate calculation, there is still 'work to be done' at the end, but that would only be the tail end of the remaining docs stored in memory, right? When indexing millions of records, this is probably not going to be a large percentage of the overall time, as it would only, at most, the last maxBufferedDocs to be tidied up. Or have I confused myself? I'm still quite new to Lucene and it's inner workings. >> The bottleneck you're hitting is that maybeMergeDocs sums the size of the >> buffered indexes each time to decide whether to merge. When you have thousands >> buffered, this dominates. Yes, when maxBufferedDocs is relatively high (which is useful, I thought, if you have the memory to throw at the application and one is trying to stay off IO as much as possible, that's what I've understood anyway) the loop ends up something like this, where N is maxBufferedDocs n + (n-1)+(n-2) + ....(n-n) (I'm no math wiz, sorry) You can notice this when indexing and outputing logging information, you see the 'rate' slow down slightly as the number of docs is added to the in memory buffer, then once the automatic merge is performed, the rate speeds up, then progressively slows down again. >> >> To optimize this case (small docs, large maxBufferedDocs) we could keep count of >> the number of documents buffered by adding a bufferedDocCount field. >> addDocument could increment this, mergeSegments could decrement it, and >> maybeMergeSegments could check it with something like: >> >> if (targetMergeDocs == minMergeDocs) { >> mergeDocs = bufferedDocCount; >> } else { >> while (--minSegment >= 0) { >> ... >> } >> } >> >> Does that make sense? Err, to be honest I'm not quite sure what you mean by "small docs" in the above first statement. I'm also a little confused on the: >> if (targetMergeDocs == minMergeDocs) { and how it relates to the bufferedDocCount you mention. In my hack/patch I'm effectively keeping track of the number of documents added as you suggest, so I believe we're pretty close to the same thing, but I blame only having one coffee on trying to understand it. I think I like where you going though, it smells right. Something I thought about last night is that the current code works fine for all cases, however the 'clean index being rebuilt' seems to get the raw end of the stick. When an index is being incrementally or batch updated, IndexWriter probably does need to scan the Segments to see if they need merging to obey the configuration settings. However a fresh index is a special case, and seems like it could be optimized (this may be what you meant by 'small docs' ?). In really large scale indexes/indices the number of documents being incrementally/batch updated is going to be totally minor compared to the cost of building the full index for the first time. For a true High Availability solution, one will always need to factor in the cost to rebuild the index from scratch should it come to that. Making the initial index as fast as possible gives much smaller down time. For crawling applications that use Lucene, this optimization will probably not even get noticed, because of the latency in retrieving the source material to make up the document. Only when the source material can be accessed fast will this optimization be important. Removing this CPU bottleneck has even more benefits when you consider those solutions using Lucene to parallel index documents. This optimization means that you get a multiplication of benefit the more CPU's being utilized, particularly if the architecture is a producer/consumer operation with a Buffer in between. (obviously IO starts to get in the way more). With an optimization the CPU can be better utilized performing tokenization etc. cheers, Paul
        Hide
        cutting@apache.org added a comment -

        Your benchmark might run faster if you set maxBufferedDocs smaller. Also, it
        doesn't look like you're including the cost of closing the IndexWriter in your
        benchmark statistics. You should, as, with such a large buffer, you've delayed
        much of the work to that point.

        The bottleneck you're hitting is that maybeMergeDocs sums the size of the
        buffered indexes each time to decide whether to merge. When you have thousands
        buffered, this dominates.

        To optimize this case (small docs, large maxBufferedDocs) we could keep count of
        the number of documents buffered by adding a bufferedDocCount field.
        addDocument could increment this, mergeSegments could decrement it, and
        maybeMergeSegments could check it with something like:

        if (targetMergeDocs == minMergeDocs)

        { mergeDocs = bufferedDocCount; }

        else {
        while (--minSegment >= 0)

        { ... }

        }

        Does that make sense?

        Show
        cutting@apache.org added a comment - Your benchmark might run faster if you set maxBufferedDocs smaller. Also, it doesn't look like you're including the cost of closing the IndexWriter in your benchmark statistics. You should, as, with such a large buffer, you've delayed much of the work to that point. The bottleneck you're hitting is that maybeMergeDocs sums the size of the buffered indexes each time to decide whether to merge. When you have thousands buffered, this dominates. To optimize this case (small docs, large maxBufferedDocs) we could keep count of the number of documents buffered by adding a bufferedDocCount field. addDocument could increment this, mergeSegments could decrement it, and maybeMergeSegments could check it with something like: if (targetMergeDocs == minMergeDocs) { mergeDocs = bufferedDocCount; } else { while (--minSegment >= 0) { ... } } Does that make sense?
        Hide
        Yonik Seeley added a comment -

        (In reply to comment #3)
        > Isn't the effect the same as setting mergeFactor to 2000, i.e. indexing gets
        > faster but more RAM is needed?

        On every add, it looks like the entire segment list is walked looking to see of
        enough docs (minMergeDocs) can be collected together to do a merge. With a
        large minMergeDocs, this can get expensive. Perhaps a count should be kept of
        the number of docs in memory, and when it exceeds minMergeDocs, then call the
        merge logic.

        Of course keeping track of the count would require modifications to many
        IndexWriter methods. It looks like the performance gains may well be worth it
        though.

        BTW, I don't think a mergefactor of 1000 is typical (way too many
        filedescriptors in use, and a big hit to searchers). A high minMergeDocs (now
        maxBufferedDocs) is typical and useful though.

        Show
        Yonik Seeley added a comment - (In reply to comment #3) > Isn't the effect the same as setting mergeFactor to 2000, i.e. indexing gets > faster but more RAM is needed? On every add, it looks like the entire segment list is walked looking to see of enough docs (minMergeDocs) can be collected together to do a merge. With a large minMergeDocs, this can get expensive. Perhaps a count should be kept of the number of docs in memory, and when it exceeds minMergeDocs, then call the merge logic. Of course keeping track of the count would require modifications to many IndexWriter methods. It looks like the performance gains may well be worth it though. BTW, I don't think a mergefactor of 1000 is typical (way too many filedescriptors in use, and a big hit to searchers). A high minMergeDocs (now maxBufferedDocs) is typical and useful though.
        Hide
        Daniel Naber added a comment -

        Isn't the effect the same as setting mergeFactor to 2000, i.e. indexing gets
        faster but more RAM is needed?

        Show
        Daniel Naber added a comment - Isn't the effect the same as setting mergeFactor to 2000, i.e. indexing gets faster but more RAM is needed?
        Hide
        Paul Smith added a comment -

        Created an attachment (id=15043)
        Demonstration Patch to defer the call to IndexWriter.maybeMergeSegments() for
        every 2000 times.

        Show
        Paul Smith added a comment - Created an attachment (id=15043) Demonstration Patch to defer the call to IndexWriter.maybeMergeSegments() for every 2000 times.
        Hide
        Paul Smith added a comment -

        Created an attachment (id=15042)
        HPROF log of test run pre-patch

        Added HPROF log showing high CPU usage inside IndexWriter.maybeMergeSegments()
        method

        Show
        Paul Smith added a comment - Created an attachment (id=15042) HPROF log of test run pre-patch Added HPROF log showing high CPU usage inside IndexWriter.maybeMergeSegments() method

          People

          • Assignee:
            Yonik Seeley
            Reporter:
            Paul Smith
          • Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development