Lucene - Core
  1. Lucene - Core
  2. LUCENE-2573

Tiered flushing of DWPTs by RAM with low/high water marks

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Realtime Branch
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs.

      A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:

      • Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
      • Flush all DWPTs at a high water mark (e.g. at 110%)
      • Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%.

      Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks?

      1. LUCENE-2573.patch
        104 kB
        Simon Willnauer
      2. LUCENE-2573.patch
        85 kB
        Simon Willnauer
      3. LUCENE-2573.patch
        69 kB
        Simon Willnauer
      4. LUCENE-2573.patch
        71 kB
        Simon Willnauer
      5. LUCENE-2573.patch
        61 kB
        Simon Willnauer
      6. LUCENE-2573.patch
        58 kB
        Simon Willnauer
      7. LUCENE-2573.patch
        19 kB
        Jason Rutherglen
      8. LUCENE-2573.patch
        19 kB
        Jason Rutherglen
      9. LUCENE-2573.patch
        17 kB
        Jason Rutherglen

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          I think just keep it simple? User sets RAM buffer size, and we compute fixed high/low watermarks from there?

          Show
          Michael McCandless added a comment - I think just keep it simple? User sets RAM buffer size, and we compute fixed high/low watermarks from there?
          Hide
          Michael Busch added a comment -

          Yeah I like that better too. Will implement that approach.

          Show
          Michael Busch added a comment - Yeah I like that better too. Will implement that approach.
          Hide
          Jason Rutherglen added a comment -

          Users probably won't customize to that level of detail, and RAM usage isn't entirely accurate in Java anyways. Lets keep it as is (ie, setting the ram buffer size).

          Show
          Jason Rutherglen added a comment - Users probably won't customize to that level of detail, and RAM usage isn't entirely accurate in Java anyways. Lets keep it as is (ie, setting the ram buffer size).
          Hide
          Michael Busch added a comment -

          Jason, are you still up for working on a patch for this one?

          We should probably get the realtime branch in a healthy state first and run some performance tests before we start working on all the fun stuff.
          Almost there!

          Show
          Michael Busch added a comment - Jason, are you still up for working on a patch for this one? We should probably get the realtime branch in a healthy state first and run some performance tests before we start working on all the fun stuff. Almost there!
          Hide
          Jason Rutherglen added a comment -

          Michael, I could, I was working on the terms dictionary, reading
          postings etc, so that deletes will operate correctly. It'd be
          great to nail down the concurrency of the *BlockPools first LUCENE-2575, as
          so much uses them, then test performance etc, otherwise we'll
          have a lot of code relying on something that could be changing?

          Show
          Jason Rutherglen added a comment - Michael, I could, I was working on the terms dictionary, reading postings etc, so that deletes will operate correctly. It'd be great to nail down the concurrency of the *BlockPools first LUCENE-2575 , as so much uses them, then test performance etc, otherwise we'll have a lot of code relying on something that could be changing?
          Hide
          Jason Rutherglen added a comment -

          So yeah I'll work on a patch for this issue.

          Show
          Jason Rutherglen added a comment - So yeah I'll work on a patch for this issue.
          Hide
          Jason Rutherglen added a comment -

          Michael, DWPT.numBytesUsed isn't currently being updated?

          Show
          Jason Rutherglen added a comment - Michael, DWPT.numBytesUsed isn't currently being updated?
          Hide
          Michael Busch added a comment -

          Michael, DWPT.numBytesUsed isn't currently being updated?

          You can delete that one. I factored all the memory allocation/tracking into DocumentsWriterRAMAllocator. You might have to get some memory related stuff from trunk, e.g. the balanceRAM() code and adapt it.

          Show
          Michael Busch added a comment - Michael, DWPT.numBytesUsed isn't currently being updated? You can delete that one. I factored all the memory allocation/tracking into DocumentsWriterRAMAllocator. You might have to get some memory related stuff from trunk, e.g. the balanceRAM() code and adapt it.
          Hide
          Jason Rutherglen added a comment -

          Is DocumentsWriterRAMAllocator.numBytesUsed accurate then? It seems like some things are being recorded into it, however I'm not immediately sure everything is...

          Show
          Jason Rutherglen added a comment - Is DocumentsWriterRAMAllocator.numBytesUsed accurate then? It seems like some things are being recorded into it, however I'm not immediately sure everything is...
          Hide
          Michael Busch added a comment -

          I'm not 100% sure, I need to review the code to refresh my memory...

          Show
          Michael Busch added a comment - I'm not 100% sure, I need to review the code to refresh my memory...
          Hide
          Michael Busch added a comment -

          Hi Jason, are you still working on the patch here?

          Show
          Michael Busch added a comment - Hi Jason, are you still working on the patch here?
          Hide
          Jason Rutherglen added a comment - - edited

          Michael, I've been on holiday in Europe for the last 2 weeks. When I'm back tomorrow I'll resume work on it.

          Show
          Jason Rutherglen added a comment - - edited Michael, I've been on holiday in Europe for the last 2 weeks. When I'm back tomorrow I'll resume work on it.
          Hide
          Jason Rutherglen added a comment -

          I'm not sure if after a DWPT is flushing we need to decrement what would effectively be a "projected RAM usage post current DWPT flush completion". Otherwise we could in many cases, start the flush of most/all of the DWPTs.

          Show
          Jason Rutherglen added a comment - I'm not sure if after a DWPT is flushing we need to decrement what would effectively be a "projected RAM usage post current DWPT flush completion". Otherwise we could in many cases, start the flush of most/all of the DWPTs.
          Hide
          Jason Rutherglen added a comment -

          It looks like StoredFieldsWriter is reused after flushing a DWPT, however we're not resetting isClosed.

          Show
          Jason Rutherglen added a comment - It looks like StoredFieldsWriter is reused after flushing a DWPT, however we're not resetting isClosed.
          Hide
          Jason Rutherglen added a comment -

          We probably need a test that delays the flush process, otherwise flushing to RAM occurs too fast to proceed to the next tier.

          Show
          Jason Rutherglen added a comment - We probably need a test that delays the flush process, otherwise flushing to RAM occurs too fast to proceed to the next tier.
          Hide
          Jason Rutherglen added a comment -

          Here's a first cut...

          The TestDWPTFlushByRAM doesn't do much at this point. It adds documents in 2 threads, and prints the RAM usage to stdout. It more or less shows the tiered flushing working.

          I don't think we're tracking all of the RAM usage yet, maybe just the terms? I need to review.

          Show
          Jason Rutherglen added a comment - Here's a first cut... The TestDWPTFlushByRAM doesn't do much at this point. It adds documents in 2 threads, and prints the RAM usage to stdout. It more or less shows the tiered flushing working. I don't think we're tracking all of the RAM usage yet, maybe just the terms? I need to review.
          Hide
          Jason Rutherglen added a comment -

          The DWPT that happens to exceed the first tier, is flushed out. This was easier to implement than finding the highest RAM consuming DWPT and flushing it, from a different thread.

          Show
          Jason Rutherglen added a comment - The DWPT that happens to exceed the first tier, is flushed out. This was easier to implement than finding the highest RAM consuming DWPT and flushing it, from a different thread.
          Hide
          Jason Rutherglen added a comment -

          I did a search through the code and ByteBlockAllocator.perDocAllocator has no references, it can probably be removed, unless there was some other intention for it.

          Show
          Jason Rutherglen added a comment - I did a search through the code and ByteBlockAllocator.perDocAllocator has no references, it can probably be removed, unless there was some other intention for it.
          Hide
          Jason Rutherglen added a comment -

          I think we can/should track the RAM consumption directly in IntBlockPool and ByteBlockPool. I'm not sure if we're tracking those allocations right now, however if we are or are not, it'd be clearer to add a getBytesUsed to these pool classes.

          Show
          Jason Rutherglen added a comment - I think we can/should track the RAM consumption directly in IntBlockPool and ByteBlockPool. I'm not sure if we're tracking those allocations right now, however if we are or are not, it'd be clearer to add a getBytesUsed to these pool classes.
          Hide
          Jason Rutherglen added a comment -

          The last comment is incorrect, DocumentsWriterRAMAllocator does keep track of all allocations of int[] and byte[], and the number of bytes used. I'm not sure why the test is reporting zero bytes used after documents have been successfully added.

          Show
          Jason Rutherglen added a comment - The last comment is incorrect, DocumentsWriterRAMAllocator does keep track of all allocations of int[] and byte[], and the number of bytes used. I'm not sure why the test is reporting zero bytes used after documents have been successfully added.
          Hide
          Jason Rutherglen added a comment -

          In DocumentsWriterRAMAllocator, we're only recording the addition of more bytes when a new block is created, however because previous blocks may be recycled, it is the recycled blocks that are not being recorded as bytes used. Should we record all allocated blocks as "in use" ie, count them as bytes used, or wait until they are "in use" again to be counted as consuming RAM?

          Show
          Jason Rutherglen added a comment - In DocumentsWriterRAMAllocator, we're only recording the addition of more bytes when a new block is created, however because previous blocks may be recycled, it is the recycled blocks that are not being recorded as bytes used. Should we record all allocated blocks as "in use" ie, count them as bytes used, or wait until they are "in use" again to be counted as consuming RAM?
          Hide
          Michael McCandless added a comment -

          We probably need a test that delays the flush process, otherwise flushing to RAM occurs too fast to proceed to the next tier.

          We can modify MockRAMDir to optionally "take its sweet time" when writing certain files?

          I'm not sure if after a DWPT is flushing we need to decrement what would effectively be a "projected RAM usage post current DWPT flush completion". Otherwise we could in many cases, start the flush of most/all of the DWPTs.

          But shouldn't tiered flushing take care of this? Ie you only decr RAM consumed when the flush of the DWPT finishes, not before?

          The DWPT that happens to exceed the first tier, is flushed out. This was easier to implement than finding the highest RAM consuming DWPT and flushing it, from a different thread.

          Hmm but this won't be most efficient, in general? Ie we could end up creating tiny segments depending on luck-of-the-thread-scheduling?

          I did a search through the code and ByteBlockAllocator.perDocAllocator has no references, it can probably be removed, unless there was some other intention for it.

          I think this makes sense – each DWPT now immediately flushes to its private doc store files, so there's no longer a need to track per-doc pending RAM?

          In DocumentsWriterRAMAllocator, we're only recording the addition of more bytes when a new block is created, however because previous blocks may be recycled, it is the recycled blocks that are not being recorded as bytes used. Should we record all allocated blocks as "in use" ie, count them as bytes used, or wait until they are "in use" again to be counted as consuming RAM?

          I think we have to track both. If a buffer is not in the pool (ie not free), then it's in use and we count that as RAM used, and that counter is used to trigger tiered flushing. Separately we have to track net allocated, in order to trim the buffers (drop them, so GC can reclaim) when we are over the .setRAMBufferSizeMB.

          Show
          Michael McCandless added a comment - We probably need a test that delays the flush process, otherwise flushing to RAM occurs too fast to proceed to the next tier. We can modify MockRAMDir to optionally "take its sweet time" when writing certain files? I'm not sure if after a DWPT is flushing we need to decrement what would effectively be a "projected RAM usage post current DWPT flush completion". Otherwise we could in many cases, start the flush of most/all of the DWPTs. But shouldn't tiered flushing take care of this? Ie you only decr RAM consumed when the flush of the DWPT finishes, not before? The DWPT that happens to exceed the first tier, is flushed out. This was easier to implement than finding the highest RAM consuming DWPT and flushing it, from a different thread. Hmm but this won't be most efficient, in general? Ie we could end up creating tiny segments depending on luck-of-the-thread-scheduling? I did a search through the code and ByteBlockAllocator.perDocAllocator has no references, it can probably be removed, unless there was some other intention for it. I think this makes sense – each DWPT now immediately flushes to its private doc store files, so there's no longer a need to track per-doc pending RAM? In DocumentsWriterRAMAllocator, we're only recording the addition of more bytes when a new block is created, however because previous blocks may be recycled, it is the recycled blocks that are not being recorded as bytes used. Should we record all allocated blocks as "in use" ie, count them as bytes used, or wait until they are "in use" again to be counted as consuming RAM? I think we have to track both. If a buffer is not in the pool (ie not free), then it's in use and we count that as RAM used, and that counter is used to trigger tiered flushing. Separately we have to track net allocated, in order to trim the buffers (drop them, so GC can reclaim) when we are over the .setRAMBufferSizeMB.
          Hide
          Jason Rutherglen added a comment -

          shouldn't tiered flushing take care of this

          Faulty thinking for a few minutes.

          but this won't be most efficient, in general? Ie we could end up creating tiny segments depending on luck-of-the-thread-scheduling?

          True. Instead, we may want to simply not-flush the current DWPT if it is in fact not the highest RAM user. When addDoc is called on the thread with the highest RAM usage, we can then flush it.

          there's no longer a need to track per-doc pending RAM

          I'll remove it from the code.

          If a buffer is not in the pool (ie not free), then it's in use and we count that as RAM used

          Ok, I'll make the change.

          we have to track net allocated, in order to trim the buffers (drop them, so GC can reclaim) when we are over the .setRAMBufferSizeMB

          I haven't seen this in the realtime branch. Reclamation of extra allocated free blocks may need to be reimplemented.

          I'll increment num bytes used when a block is returned for use.

          On this topic, do you have any thoughts yet about how to make the block pools concurrent? I'm still leaning towards a random access file (seek style) interface because this is easy to make concurrent, and hides the underlying block management mechanism, rather than directly exposes it like today, which can lend itself to problematic usage in the future.

          Show
          Jason Rutherglen added a comment - shouldn't tiered flushing take care of this Faulty thinking for a few minutes. but this won't be most efficient, in general? Ie we could end up creating tiny segments depending on luck-of-the-thread-scheduling? True. Instead, we may want to simply not-flush the current DWPT if it is in fact not the highest RAM user. When addDoc is called on the thread with the highest RAM usage, we can then flush it. there's no longer a need to track per-doc pending RAM I'll remove it from the code. If a buffer is not in the pool (ie not free), then it's in use and we count that as RAM used Ok, I'll make the change. we have to track net allocated, in order to trim the buffers (drop them, so GC can reclaim) when we are over the .setRAMBufferSizeMB I haven't seen this in the realtime branch. Reclamation of extra allocated free blocks may need to be reimplemented. I'll increment num bytes used when a block is returned for use. On this topic, do you have any thoughts yet about how to make the block pools concurrent? I'm still leaning towards a random access file (seek style) interface because this is easy to make concurrent, and hides the underlying block management mechanism, rather than directly exposes it like today, which can lend itself to problematic usage in the future.
          Hide
          Jason Rutherglen added a comment -

          We can modify MockRAMDir to optionally "take its sweet time" when writing certain files?

          Yes, I think we need to implement something of this nature. We could even randomly assign a different delay value per flush. Of course how the test would instigate this from outside of DW, is somewhat of a different issue.

          Show
          Jason Rutherglen added a comment - We can modify MockRAMDir to optionally "take its sweet time" when writing certain files? Yes, I think we need to implement something of this nature. We could even randomly assign a different delay value per flush. Of course how the test would instigate this from outside of DW, is somewhat of a different issue.
          Hide
          Jason Rutherglen added a comment -
          • perDocAllocator is removed from DocumentsWriterRAMAllocator
          • getByteBlock and getIntBlock always increments the numBytesUsed

          The test that simply prints out debugging messages looks better. I need to figure out unit tests.

          Show
          Jason Rutherglen added a comment - perDocAllocator is removed from DocumentsWriterRAMAllocator getByteBlock and getIntBlock always increments the numBytesUsed The test that simply prints out debugging messages looks better. I need to figure out unit tests.
          Hide
          Jason Rutherglen added a comment -

          The last patch also only flushes a DWPT if it's the highest RAM consumer.

          Show
          Jason Rutherglen added a comment - The last patch also only flushes a DWPT if it's the highest RAM consumer.
          Hide
          Jason Rutherglen added a comment -

          There was a small bug in the choice of the max DWPT, in that all DWPTs, including ones that were scheduled to flush were being compared against the current DWPT (ie the one being examined for possible flushing).

          Show
          Jason Rutherglen added a comment - There was a small bug in the choice of the max DWPT, in that all DWPTs, including ones that were scheduled to flush were being compared against the current DWPT (ie the one being examined for possible flushing).
          Hide
          Jason Rutherglen added a comment -

          I was hoping something clever would come to me about how to unit test this, nothing has. We can do the slowdown of writes to the file(s) via a Thread.sleep, however this will only emulate a real file system in RAM, what then? I thought about testing the percentage however is it going to be exact? We could test a percentage range of each of the segments flushed? I guess I just need to run the all of the unit tests, however some of those will fail because deletes aren't working properly yet.

          Show
          Jason Rutherglen added a comment - I was hoping something clever would come to me about how to unit test this, nothing has. We can do the slowdown of writes to the file(s) via a Thread.sleep, however this will only emulate a real file system in RAM, what then? I thought about testing the percentage however is it going to be exact? We could test a percentage range of each of the segments flushed? I guess I just need to run the all of the unit tests, however some of those will fail because deletes aren't working properly yet.
          Hide
          Simon Willnauer added a comment -

          Here is my first cut / current status on this issue. First of all I have a couple of failures related to deletes but they seem not to be related (directly) to this patch since I can reproduce them even without the patch.
          all of the failures are related to deletes in some way so I suspect that there is another issue for that, no?

          This patch implements a tiered flush strategy combined with a concurrent flush approach.

          • All decisions are based on a FlushPolicy which operates on a DocumentsWriterSession (does the ram tracking and housekeeping), once the flush policy encounters a transition to the next tier it marks the "largest" ram consuming thread
            as flushPending if we transition from a lower level and all threads if we transition from the upper watermark (level). DocumentsWriterSession shifts the memory of a pending thread to a new memory "level" (pendingBytes) and marks the thread as pending.
          • Once FlushPolicy#findFlushes(..) returns the caller tries to check if itself needs to flush and if so it "checks-out" its DWPT and replaces it with a complete new instance. Releases the lock on the ThreadState and continues to flush the "checked-out" DWPT. After this is done or if the current DWPT doesn't need flushing the indexing thread checks if there are any other pending flushes and tries to (non-blocking) obtain their lock. It only tries to get the lock and only tries once since if the lock is taken another thread is already holding it and will see the flushPending once finished adding the document.

          This approach tries to utilize as much conccurrency as possible while flushing the DWPT and releaseing its ThreadState with an entirely new DWPT. Yet, this might also have problems especially if IO is slow and we filling up indexing RAM too fast. To prevent us from bloating up the memory too much I introduced a notation of "healtiness" which operates on the net-bytes used in the DocumentsWriterSession (flushBytes + pendingBytes + activeBytes) – (flushBytes - mem consumption of currently flushing DWPT, pendingBytes - mem consumption of marked as pending ThreadStates / DWPT, activeBytes mem consuption of the indexing DWPT). If net-bytes reach a certain threshold (2*maxRam currently) I stop incoming threads until the session becomes healty again.

          I run luceneutil with trunk vs. LUCENE-2573 indexing 300k wikipedia docs with 1GB MaxRamBuffer and 4 Threads. Searches on both indexes yield identical results (Phew!)
          Indexing time in ms look promising

          trunk patch diff
          134129 ms 102932 ms 23.25%

          This patch is still kind of rough and needs iterations so reviews and questions are very much welcome.

          Show
          Simon Willnauer added a comment - Here is my first cut / current status on this issue. First of all I have a couple of failures related to deletes but they seem not to be related (directly) to this patch since I can reproduce them even without the patch. all of the failures are related to deletes in some way so I suspect that there is another issue for that, no? This patch implements a tiered flush strategy combined with a concurrent flush approach. All decisions are based on a FlushPolicy which operates on a DocumentsWriterSession (does the ram tracking and housekeeping), once the flush policy encounters a transition to the next tier it marks the "largest" ram consuming thread as flushPending if we transition from a lower level and all threads if we transition from the upper watermark (level). DocumentsWriterSession shifts the memory of a pending thread to a new memory "level" (pendingBytes) and marks the thread as pending. Once FlushPolicy#findFlushes(..) returns the caller tries to check if itself needs to flush and if so it "checks-out" its DWPT and replaces it with a complete new instance. Releases the lock on the ThreadState and continues to flush the "checked-out" DWPT. After this is done or if the current DWPT doesn't need flushing the indexing thread checks if there are any other pending flushes and tries to (non-blocking) obtain their lock. It only tries to get the lock and only tries once since if the lock is taken another thread is already holding it and will see the flushPending once finished adding the document. This approach tries to utilize as much conccurrency as possible while flushing the DWPT and releaseing its ThreadState with an entirely new DWPT. Yet, this might also have problems especially if IO is slow and we filling up indexing RAM too fast. To prevent us from bloating up the memory too much I introduced a notation of "healtiness" which operates on the net-bytes used in the DocumentsWriterSession (flushBytes + pendingBytes + activeBytes) – (flushBytes - mem consumption of currently flushing DWPT, pendingBytes - mem consumption of marked as pending ThreadStates / DWPT, activeBytes mem consuption of the indexing DWPT). If net-bytes reach a certain threshold (2*maxRam currently) I stop incoming threads until the session becomes healty again. I run luceneutil with trunk vs. LUCENE-2573 indexing 300k wikipedia docs with 1GB MaxRamBuffer and 4 Threads. Searches on both indexes yield identical results (Phew!) Indexing time in ms look promising trunk patch diff 134129 ms 102932 ms 23.25% This patch is still kind of rough and needs iterations so reviews and questions are very much welcome.
          Hide
          Michael McCandless added a comment -

          OK I tested trunk vs RT branch + patch, indexing 10M Wikipedia docs
          with 1 GB RAM buffer and 8 threads.

          Trunk took 298 sec for initial indexing, then 198 sec to wait for
          merges to catch up, and 24 sec to commit.

          RT+patch took 289 sec for initial indexing, then 225 sec to wait for
          merges, then 26 sec to commit.

          Not sure yet why I'm not seeing real concurrency gains here... is
          there anything printed to infoStream if the safety net (hijack app
          threads) kicks in?

          Show
          Michael McCandless added a comment - OK I tested trunk vs RT branch + patch, indexing 10M Wikipedia docs with 1 GB RAM buffer and 8 threads. Trunk took 298 sec for initial indexing, then 198 sec to wait for merges to catch up, and 24 sec to commit. RT+patch took 289 sec for initial indexing, then 225 sec to wait for merges, then 26 sec to commit. Not sure yet why I'm not seeing real concurrency gains here... is there anything printed to infoStream if the safety net (hijack app threads) kicks in?
          Hide
          Michael McCandless added a comment -

          Woops – I screwed up the above test!

          Corrected numbers: trunk takes 439 sec to index 10M docs, 202 sec to waitForMerges, 17 sec to commit.

          RT + patch: 289 sec to index 10M docs, 225 sec to waitForMerges, 26 sec to commit.

          This is w/ 8 threads (machine has 24 cores), writing to Intel X25M G2 ssd.

          Awesome speedup!! Nice work everyone Looking forward to making a new blog post soon!

          Show
          Michael McCandless added a comment - Woops – I screwed up the above test! Corrected numbers: trunk takes 439 sec to index 10M docs, 202 sec to waitForMerges, 17 sec to commit. RT + patch: 289 sec to index 10M docs, 225 sec to waitForMerges, 26 sec to commit. This is w/ 8 threads (machine has 24 cores), writing to Intel X25M G2 ssd. Awesome speedup!! Nice work everyone Looking forward to making a new blog post soon!
          Hide
          Simon Willnauer added a comment -

          here is an updated patch that writes more stuff to the infostream including if we are unhealthy and block threads.
          I also fixed some issues with TestIndexWriterExceptions.

          Another idea we should follow IMO is to see if we can biggypack the indexing threads on commit / flushAll instead of waiting to be able to lock each DWPT and flush it sequentially. This should be fairly easy since we can simply mark them as flushPending and let incoming indexing thread do the flush in parallel. Depending on how we index and how big the DWPTs are this could give us another sizable gain. For instance if you index and frequently commit, lets say every 10k docs (so many folks do stuff like that) but keep on indexing we should see concurrency helping us a lot since commit is not blocking all incoming indexing threads. I think we should spinoff another issues once this is ready

          Show
          Simon Willnauer added a comment - here is an updated patch that writes more stuff to the infostream including if we are unhealthy and block threads. I also fixed some issues with TestIndexWriterExceptions. Another idea we should follow IMO is to see if we can biggypack the indexing threads on commit / flushAll instead of waiting to be able to lock each DWPT and flush it sequentially. This should be fairly easy since we can simply mark them as flushPending and let incoming indexing thread do the flush in parallel. Depending on how we index and how big the DWPTs are this could give us another sizable gain. For instance if you index and frequently commit, lets say every 10k docs (so many folks do stuff like that) but keep on indexing we should see concurrency helping us a lot since commit is not blocking all incoming indexing threads. I think we should spinoff another issues once this is ready
          Hide
          Michael McCandless added a comment -

          Not done reviewing the patch but here's some initial feedback:

          Very cool (and super advanced) that this adds a FlushPolicy! But for
          "normal" usage we go and make either DocCountFP or TieredFP, depending
          on whether IWC is flushing by docCount, RAM or both right? Ie one
          normally need not make their own FlushPolicy.

          Maybe rename TieredFP -> ByRAMFP? Also, I'm not sure we need the N
          tiers? I suspect that may flush too heavily? Can we instead simplify
          it and have only the low and high water marks? So we flush when
          active RAM is over low water mark? (And we stall if active + flushing
          RAM exceeds high water mark).

          Can we rename isHealthy to isStalled (ie, invert it)?

          I'm still unsure we should even include any healthy check APIs. This
          is an exceptional situation and I don't think we need API exposure for
          it? If apps really want to, they can turn on infoStream (we should
          make sure "stalling" is logged, just like it is for merging) and
          debug from there?

          Maybe rename pendingBytes to flushingBytes? Or maybe
          flushPendingBytes? (Just to make it clear what we are pending on...).

          Maybe rename FP.printInfo(String msg) --> FP.message? (Consistent w/
          our other classes).

          I wonder if FP.findFlushes should be renamed to something like
          FP.visit, and return void? Ie, it's called for its side effects of
          marking DWPTs for flushing, right? Separately, whether or not this
          thread will go and flush a DWPT is for IW to decide? (Like it could
          be this thread didn't mark any new flush required, but it should go
          off and pull a DWPT previously marked by another thread). So then IW
          would have a private volatile boolean recording whether any active
          DWPTs have flushPending.

          Show
          Michael McCandless added a comment - Not done reviewing the patch but here's some initial feedback: Very cool (and super advanced) that this adds a FlushPolicy! But for "normal" usage we go and make either DocCountFP or TieredFP, depending on whether IWC is flushing by docCount, RAM or both right? Ie one normally need not make their own FlushPolicy. Maybe rename TieredFP -> ByRAMFP? Also, I'm not sure we need the N tiers? I suspect that may flush too heavily? Can we instead simplify it and have only the low and high water marks? So we flush when active RAM is over low water mark? (And we stall if active + flushing RAM exceeds high water mark). Can we rename isHealthy to isStalled (ie, invert it)? I'm still unsure we should even include any healthy check APIs. This is an exceptional situation and I don't think we need API exposure for it? If apps really want to, they can turn on infoStream (we should make sure "stalling" is logged, just like it is for merging) and debug from there? Maybe rename pendingBytes to flushingBytes? Or maybe flushPendingBytes? (Just to make it clear what we are pending on...). Maybe rename FP.printInfo(String msg) --> FP.message? (Consistent w/ our other classes). I wonder if FP.findFlushes should be renamed to something like FP.visit, and return void? Ie, it's called for its side effects of marking DWPTs for flushing, right? Separately, whether or not this thread will go and flush a DWPT is for IW to decide? (Like it could be this thread didn't mark any new flush required, but it should go off and pull a DWPT previously marked by another thread). So then IW would have a private volatile boolean recording whether any active DWPTs have flushPending.
          Hide
          Simon Willnauer added a comment -

          Maybe rename TieredFP -> ByRAMFP? Also, I'm not sure we need the N
          tiers? I suspect that may flush too heavily? Can we instead simplify
          it and have only the low and high water marks? So we flush when
          active RAM is over low water mark? (And we stall if active + flushing
          RAM exceeds high water mark).

          this really sounds like a different FlushPolicy to me. But is worth a try - should be easy to add with this patch. so you mean we always flush ALL DWPT once we reached the low watermark? I don't think this is a good idea. And I wonder if that is a bit too aggressive to say you put DW into stalled mode if we exceed the high watermark. Anyway we can try and see what works better right?

          Can we rename isHealthy to isStalled (ie, invert it)?

          sure isStalled sounds fine

          I'm still unsure we should even include any healthy check APIs.

          for now this is internal only so even if we decide to I would shift that to a different issue.

          Maybe rename pendingBytes to flushingBytes? Or maybe
          flushPendingBytes? (Just to make it clear what we are pending on...).

          yeah that is true - flushPendingBytes to make it consistent - my fault...

          I wonder if FP.findFlushes should be renamed to something like
          FP.visit, and return void? Ie, it's called for its side effects of
          marking DWPTs for flushing, right? Separately, whether or not this
          thread will go and flush a DWPT is for IW to decide? (Like it could
          be this thread didn't mark any new flush required, but it should go
          off and pull a DWPT previously marked by another thread). So then IW
          would have a private volatile boolean recording whether any active
          DWPTs have flushPending.

          I was unsure about the name too so I just made it consistent with MergePolicy. Visit is ok I think.
          the return value is maybe a relict from earlier version where I haven't had the DocWriterSession#hasPendingFlushes() yeah I think we can make that void and simply check if there are any. I think I do that today already

          Show
          Simon Willnauer added a comment - Maybe rename TieredFP -> ByRAMFP? Also, I'm not sure we need the N tiers? I suspect that may flush too heavily? Can we instead simplify it and have only the low and high water marks? So we flush when active RAM is over low water mark? (And we stall if active + flushing RAM exceeds high water mark). this really sounds like a different FlushPolicy to me. But is worth a try - should be easy to add with this patch. so you mean we always flush ALL DWPT once we reached the low watermark? I don't think this is a good idea. And I wonder if that is a bit too aggressive to say you put DW into stalled mode if we exceed the high watermark. Anyway we can try and see what works better right? Can we rename isHealthy to isStalled (ie, invert it)? sure isStalled sounds fine I'm still unsure we should even include any healthy check APIs. for now this is internal only so even if we decide to I would shift that to a different issue. Maybe rename pendingBytes to flushingBytes? Or maybe flushPendingBytes? (Just to make it clear what we are pending on...). yeah that is true - flushPendingBytes to make it consistent - my fault... I wonder if FP.findFlushes should be renamed to something like FP.visit, and return void? Ie, it's called for its side effects of marking DWPTs for flushing, right? Separately, whether or not this thread will go and flush a DWPT is for IW to decide? (Like it could be this thread didn't mark any new flush required, but it should go off and pull a DWPT previously marked by another thread). So then IW would have a private volatile boolean recording whether any active DWPTs have flushPending. I was unsure about the name too so I just made it consistent with MergePolicy. Visit is ok I think. the return value is maybe a relict from earlier version where I haven't had the DocWriterSession#hasPendingFlushes() yeah I think we can make that void and simply check if there are any. I think I do that today already
          Hide
          Michael Busch added a comment -

          Awesome speedup!!

          YAY! Glad the branch is actually faster

          Thanks for helping out with this patch, Simon. I'll try to look at the patch soon. My last week was super busy.

          Show
          Michael Busch added a comment - Awesome speedup!! YAY! Glad the branch is actually faster Thanks for helping out with this patch, Simon. I'll try to look at the patch soon. My last week was super busy.
          Hide
          Michael McCandless added a comment -

          so you mean we always flush ALL DWPT once we reached the low watermark?

          No, I mean: as soon as we pass low wm, pick biggest DWPT and flush it.
          As soon as you mark that DWPT as flushPending, its RAM used is removed
          from active pool and added to flushPending pool.

          Then, if the active pool again crosses low wm, pick the biggest and
          mark as flush pending, etc.

          But if the flushing cannot keep up, and the sum of active +
          flushPending pools crosses high wm, you hijack (stall) incoming
          threads.

          I think this may make a good "flush by RAM" policy, but I agree we should
          test. I think the fully tiered approach may be overly complex...

          for now this is internal only so even if we decide to I would shift that to a different issue.

          OK sounds good.

          Also, if the app really cares about this (I suspect none will) they
          could make a custom FlushPolicy that they could directly query to find
          out when threads get stalled.

          Besides this, is it only getting flushing of deletes working
          correctly that remains, before landing RT?

          Show
          Michael McCandless added a comment - so you mean we always flush ALL DWPT once we reached the low watermark? No, I mean: as soon as we pass low wm, pick biggest DWPT and flush it. As soon as you mark that DWPT as flushPending, its RAM used is removed from active pool and added to flushPending pool. Then, if the active pool again crosses low wm, pick the biggest and mark as flush pending, etc. But if the flushing cannot keep up, and the sum of active + flushPending pools crosses high wm, you hijack (stall) incoming threads. I think this may make a good "flush by RAM" policy, but I agree we should test. I think the fully tiered approach may be overly complex... for now this is internal only so even if we decide to I would shift that to a different issue. OK sounds good. Also, if the app really cares about this (I suspect none will) they could make a custom FlushPolicy that they could directly query to find out when threads get stalled. Besides this, is it only getting flushing of deletes working correctly that remains, before landing RT?
          Hide
          Simon Willnauer added a comment -

          I think this may make a good "flush by RAM" policy, but I agree we should
          test. I think the fully tiered approach may be overly complex...

          yeah possibly, I think simplifying this is easy now though...

          Also, if the app really cares about this (I suspect none will) they
          could make a custom FlushPolicy that they could directly query to find
          out when threads get stalled.

          yeah I think we don't need to expose that through IW.

          Besides this, is it only getting flushing of deletes working
          correctly that remains, before landing RT?

          we need to fix LUCENE-2881 first too.

          Show
          Simon Willnauer added a comment - I think this may make a good "flush by RAM" policy, but I agree we should test. I think the fully tiered approach may be overly complex... yeah possibly, I think simplifying this is easy now though... Also, if the app really cares about this (I suspect none will) they could make a custom FlushPolicy that they could directly query to find out when threads get stalled. yeah I think we don't need to expose that through IW. Besides this, is it only getting flushing of deletes working correctly that remains, before landing RT? we need to fix LUCENE-2881 first too.
          Hide
          Michael Busch added a comment -

          we need to fix LUCENE-2881 first too.

          Yeah, I haven't merged with trunk since we rolled back 2881, so we should fix it first, catch up with trunk, and then make deletes work. I might have a bit time tonight to work on 2881.

          Show
          Michael Busch added a comment - we need to fix LUCENE-2881 first too. Yeah, I haven't merged with trunk since we rolled back 2881, so we should fix it first, catch up with trunk, and then make deletes work. I might have a bit time tonight to work on 2881.
          Hide
          Simon Willnauer added a comment -

          next iteration on this patch. I changed some naming issues and separated a ByRAMFlushPolicy as an abstract base class. This patch contains the original MultiTierFlushPolicy and a SingleTierFlushPolicy that only has a low and a high watermark. This policy tries to flush the biggest DWPT once LW is crossed and flushed all DWPT once HW is crossed.

          This patch also adds a "flush if stalled" control that hijacks indexing threads if the DW is stalled and there are still pending flushes. If so the incoming thread tries to check out a pending DWPT and flushes it before it adds the actual document.

          I didn't benchmark the more complex MultiTierFP vs. SingleTierFP yet. I hope I get to this soon.

          Show
          Simon Willnauer added a comment - next iteration on this patch. I changed some naming issues and separated a ByRAMFlushPolicy as an abstract base class. This patch contains the original MultiTierFlushPolicy and a SingleTierFlushPolicy that only has a low and a high watermark. This policy tries to flush the biggest DWPT once LW is crossed and flushed all DWPT once HW is crossed. This patch also adds a "flush if stalled" control that hijacks indexing threads if the DW is stalled and there are still pending flushes. If so the incoming thread tries to check out a pending DWPT and flushes it before it adds the actual document. I didn't benchmark the more complex MultiTierFP vs. SingleTierFP yet. I hope I get to this soon.
          Hide
          Michael McCandless added a comment -

          I still see a healtiness (mis-spelled) in DW.

          I'd rather not have the stalling/healthiness be baked into the API, at
          all. Can we put the hijack logic entirely private in the flush-by-ram
          policies? (Ie remove isStalled()/hijackThreadsForFlush()).

          Instead of

          +    synchronized (docWriter.docWriterSession) {
          +      netBytes = docWriter.docWriterSession.netBytes();
          +    }
          

          , shouldn't we just make that method sync'd?

          Be careful defaulting TermsHash.trackAllocations to true – eg term
          vectors wants this to be false.

          Can we move FlushSpecification out of FlushPolicy? Ie, it's a private
          impl detail of DW right? (Not part of FlushPolicy's API). Actually
          why do we need it? Can't we just return the DWPT?

          Why do we have a separate DocWriterSession? Can't this be absorbed
          into DocWriter?

          Instead of FlushPolicy.message, can't the policy call DW.message?

          On the by-RAM flush policies... when you hit the high water mark, we
          should 1) flush all DWPTs and 2) stall any other threads.

          Why do we dereference the DWPTs with their ord? EG, can't we just
          store their "state" (active or flushPending) on the DWPT instead of in
          a separate states[]?

          Do we really need FlushState.Aborted? And if not... do we really need
          FlushState (since it just becomes 2 states, ie, Active or Flushing,
          which I think is then redundant w/ flushPending boolean?).

          I think the default low water should be 1X of your RAM buffer? And
          high water maybe 2X? (For both flush-by-RAM policies).

          Show
          Michael McCandless added a comment - I still see a healtiness (mis-spelled) in DW. I'd rather not have the stalling/healthiness be baked into the API, at all. Can we put the hijack logic entirely private in the flush-by-ram policies? (Ie remove isStalled()/hijackThreadsForFlush()). Instead of + synchronized (docWriter.docWriterSession) { + netBytes = docWriter.docWriterSession.netBytes(); + } , shouldn't we just make that method sync'd? Be careful defaulting TermsHash.trackAllocations to true – eg term vectors wants this to be false. Can we move FlushSpecification out of FlushPolicy? Ie, it's a private impl detail of DW right? (Not part of FlushPolicy's API). Actually why do we need it? Can't we just return the DWPT? Why do we have a separate DocWriterSession? Can't this be absorbed into DocWriter? Instead of FlushPolicy.message, can't the policy call DW.message? On the by-RAM flush policies... when you hit the high water mark, we should 1) flush all DWPTs and 2) stall any other threads. Why do we dereference the DWPTs with their ord? EG, can't we just store their "state" (active or flushPending) on the DWPT instead of in a separate states[]? Do we really need FlushState.Aborted? And if not... do we really need FlushState (since it just becomes 2 states, ie, Active or Flushing, which I think is then redundant w/ flushPending boolean?). I think the default low water should be 1X of your RAM buffer? And high water maybe 2X? (For both flush-by-RAM policies).
          Hide
          Simon Willnauer added a comment -

          I still see a healtiness (mis-spelled) in DW.

          ugh. I will fix

          I'd rather not have the stalling/healthiness be baked into the API, at
          all. Can we put the hijack logic entirely private in the flush-by-ram
          policies? (Ie remove isStalled()/hijackThreadsForFlush()).

          I agree for the hijack part but the isStalled is something I might want to control. I mean we can still open it up eventually so rather make it private for now but keep a not on in.

          Can we move FlushSpecification out of FlushPolicy? Ie, it's a private
          impl detail of DW right? (Not part of FlushPolicy's API). Actually
          why do we need it? Can't we just return the DWPT?

          it currently holds the ram usage for that DWPT when it was checked out so that I can reduce the flushBytes accordingly. We can maybe get rid of it entirely but I don't want to rely on the DWPT bytesUsed() though.
          We can certainly move it out - this inner class is a relict though.

          Why do we have a separate DocWriterSession? Can't this be absorbed

          into DocWriter?

          I generally don't like cluttering DocWriter and let it grow like IW. DocWriterSession might not be the ideal name for this class but its really a ram tracker for this DW. Yet, we can move out some parts that do not directly relate to mem tracking. Maybe DocWriterBytes?

          Be careful defaulting TermsHash.trackAllocations to true – eg term

          vectors wants this to be false.

          I need to go through the IndexingChain and check carefully where to track memory anyway. I haven't got to that yet but good that you mention it that one could easily get lost.

          Instead of FlushPolicy.message, can't the policy call DW.message?

          I don't want to couple that API to DW. What would be the benefit beside from saving a single method?

          On the by-RAM flush policies... when you hit the high water mark, we
should 1) flush all DWPTs and 2) stall any other threads.

          Well I am not sure if we should do that. I don't really see why we should forcefully stop the world here. Incoming threads will pick up a flush immediately and if we have enough resources to index further why should we wait until all DWPT are flushed. if we stall I fear that we could queue up threads that could help flushing while stalling would simply stop them doing anything, right? You can still control this with the healthiness though. We currently do flush all DWPT btw. once we hit the HW.

          Why do we dereference the DWPTs with their ord? EG, can't we just
store their "state" (active or flushPending) on the DWPT instead of in
a separate states[]?

          That is definitely an option. I will give that a go.

          Do we really need FlushState.Aborted? And if not... do we really need
FlushState (since it just becomes 2 states, ie, Active or Flushing,
which I think is then redundant w/ flushPending boolean?).

          this needs some more refactoring I will attach another iteration

          I think the default low water should be 1X of your RAM buffer? And
high water maybe 2X? (For both flush-by-RAM policies).

          hmm, I think we need to revise the maxRAMBufferMB Javadoc anyway so we have all the freedom to do whatever we want. yet, I think we should try to keep the RAM consumption similar to what it would have used in a previous release. So if we say HW is 2x then suddenly some apps might run out of memory. I am not sure if we should do that or rather stick to the 90% to 110% for now. We need to find good defaults for this anyway.

          Show
          Simon Willnauer added a comment - I still see a healtiness (mis-spelled) in DW. ugh. I will fix I'd rather not have the stalling/healthiness be baked into the API, at all. Can we put the hijack logic entirely private in the flush-by-ram policies? (Ie remove isStalled()/hijackThreadsForFlush()). I agree for the hijack part but the isStalled is something I might want to control. I mean we can still open it up eventually so rather make it private for now but keep a not on in. Can we move FlushSpecification out of FlushPolicy? Ie, it's a private impl detail of DW right? (Not part of FlushPolicy's API). Actually why do we need it? Can't we just return the DWPT? it currently holds the ram usage for that DWPT when it was checked out so that I can reduce the flushBytes accordingly. We can maybe get rid of it entirely but I don't want to rely on the DWPT bytesUsed() though. We can certainly move it out - this inner class is a relict though. Why do we have a separate DocWriterSession? Can't this be absorbed into DocWriter? I generally don't like cluttering DocWriter and let it grow like IW. DocWriterSession might not be the ideal name for this class but its really a ram tracker for this DW. Yet, we can move out some parts that do not directly relate to mem tracking. Maybe DocWriterBytes? Be careful defaulting TermsHash.trackAllocations to true – eg term vectors wants this to be false. I need to go through the IndexingChain and check carefully where to track memory anyway. I haven't got to that yet but good that you mention it that one could easily get lost. Instead of FlushPolicy.message, can't the policy call DW.message? I don't want to couple that API to DW. What would be the benefit beside from saving a single method? On the by-RAM flush policies... when you hit the high water mark, we
should 1) flush all DWPTs and 2) stall any other threads. Well I am not sure if we should do that. I don't really see why we should forcefully stop the world here. Incoming threads will pick up a flush immediately and if we have enough resources to index further why should we wait until all DWPT are flushed. if we stall I fear that we could queue up threads that could help flushing while stalling would simply stop them doing anything, right? You can still control this with the healthiness though. We currently do flush all DWPT btw. once we hit the HW. Why do we dereference the DWPTs with their ord? EG, can't we just
store their "state" (active or flushPending) on the DWPT instead of in
a separate states[]? That is definitely an option. I will give that a go. Do we really need FlushState.Aborted? And if not... do we really need
FlushState (since it just becomes 2 states, ie, Active or Flushing,
which I think is then redundant w/ flushPending boolean?). this needs some more refactoring I will attach another iteration I think the default low water should be 1X of your RAM buffer? And
high water maybe 2X? (For both flush-by-RAM policies). hmm, I think we need to revise the maxRAMBufferMB Javadoc anyway so we have all the freedom to do whatever we want. yet, I think we should try to keep the RAM consumption similar to what it would have used in a previous release. So if we say HW is 2x then suddenly some apps might run out of memory. I am not sure if we should do that or rather stick to the 90% to 110% for now. We need to find good defaults for this anyway.
          Hide
          Michael McCandless added a comment -

          it currently holds the ram usage for that DWPT when it was checked out so that I can reduce the flushBytes accordingly. We can maybe get rid of it entirely but I don't want to rely on the DWPT bytesUsed() though.

          Hmm, but, once a DWPT is pulled from production, its bytesUsed()
          should not be changing anymore? Like why can't we use it to hold its
          bytesUsed?

          I generally don't like cluttering DocWriter and let it grow like IW. DocWriterSession might not be the ideal name for this class but its really a ram tracker for this DW. Yet, we can move out some parts that do not directly relate to mem tracking. Maybe DocWriterBytes?

          Well DocWriter is quite small now (On RT branch). And adding
          another class means we have to be careful about proper sync'ing (lock
          order, to avoid deadlock)... and I think it should get smaller if we
          can remove state[] array, FlushState enum, etc. but, OK I guess we can
          leave it as separate for now. How about DocumentsWriterRAMUsage?
          RAMTracker?

          Instead of FlushPolicy.message, can't the policy call DW.message?

          I don't want to couple that API to DW. What would be the benefit beside from saving a single method?

          Hmm, good point. Though, it already has a SetOnce<DocumentsWriter> –
          how come? Can the policy call IW.message? I just think FlushPolicy
          ought to be very lean, ie show you exactly what you need to
          implement...

          bq. On the by-RAM flush policies... when you hit the high water mark, we
should 1) flush all DWPTs and 2) stall any other threads.

          Well I am not sure if we should do that. I don't really see why we should forcefully stop the world here. Incoming threads will pick up a flush immediately and if we have enough resources to index further why should we wait until all DWPT are flushed. if we stall I fear that we could queue up threads that could help flushing while stalling would simply stop them doing anything, right? You can still control this with the healthiness though. We currently do flush all DWPT btw. once we hit the HW.

          As long as we default the high mark to something "generous" (2X low
          mark), I think this approach should work well.

          Ie, we "begin" flushing as soon as low mark is crossed on active RAM.
          We pick the biggest DWPT and take it of rotation, and immediately
          deduct its RAM usage from the active pool. If, while we are still
          flushing, active RAM again grows above the low mark, then we pull
          another DWPT, etc. But then if ever the total flushing + active
          exceeds the high mark, we stall.

          BTW why do we track flushPending RAM vs flushing RAM? Is that
          distinction necessary? (Can't we just track "flushing" RAM?).

          Show
          Michael McCandless added a comment - it currently holds the ram usage for that DWPT when it was checked out so that I can reduce the flushBytes accordingly. We can maybe get rid of it entirely but I don't want to rely on the DWPT bytesUsed() though. Hmm, but, once a DWPT is pulled from production, its bytesUsed() should not be changing anymore? Like why can't we use it to hold its bytesUsed? I generally don't like cluttering DocWriter and let it grow like IW. DocWriterSession might not be the ideal name for this class but its really a ram tracker for this DW. Yet, we can move out some parts that do not directly relate to mem tracking. Maybe DocWriterBytes? Well DocWriter is quite small now (On RT branch). And adding another class means we have to be careful about proper sync'ing (lock order, to avoid deadlock)... and I think it should get smaller if we can remove state[] array, FlushState enum, etc. but, OK I guess we can leave it as separate for now. How about DocumentsWriterRAMUsage? RAMTracker? Instead of FlushPolicy.message, can't the policy call DW.message? I don't want to couple that API to DW. What would be the benefit beside from saving a single method? Hmm, good point. Though, it already has a SetOnce<DocumentsWriter> – how come? Can the policy call IW.message? I just think FlushPolicy ought to be very lean, ie show you exactly what you need to implement... bq. On the by-RAM flush policies... when you hit the high water mark, we
should 1) flush all DWPTs and 2) stall any other threads. Well I am not sure if we should do that. I don't really see why we should forcefully stop the world here. Incoming threads will pick up a flush immediately and if we have enough resources to index further why should we wait until all DWPT are flushed. if we stall I fear that we could queue up threads that could help flushing while stalling would simply stop them doing anything, right? You can still control this with the healthiness though. We currently do flush all DWPT btw. once we hit the HW. As long as we default the high mark to something "generous" (2X low mark), I think this approach should work well. Ie, we "begin" flushing as soon as low mark is crossed on active RAM. We pick the biggest DWPT and take it of rotation, and immediately deduct its RAM usage from the active pool. If, while we are still flushing, active RAM again grows above the low mark, then we pull another DWPT, etc. But then if ever the total flushing + active exceeds the high mark, we stall. BTW why do we track flushPending RAM vs flushing RAM? Is that distinction necessary? (Can't we just track "flushing" RAM?).
          Hide
          Simon Willnauer added a comment -

          next iteration containing a large number of refactorings.

          • I moved all responsibilities related to flushing including synchronization into the DocsWriterSession and renamed it to DocumentsWriterFlushControl.
          • DWFC now only tracks active and flush bytes since the relict from my initial patch where pending memory was tracked is not needed anymore.
          • DWFC took over all synchronization so there is not synchronized (flushControl) {...}

            in DocumentsWriter anymore. Seem way cleaner too though.

          • Healthiness now blocks once we reach 2x maxMemory and SingleTierFlushPolicy uses 0.9 maxRam as low watermark and 2x low watermark as its HW to flush all threads. The multi tier one is still unchanged and flushes in linear steps from 0.9 to 1.10 x maxRam. We should actually test if this does better worse than the single tier FP.
          • FlushPolicy now has only a visit method and uses the IW.message to write to info stream.
          • ThreadState now holds a boolean flag that indicates if a flush is pending which is synced and written by DWFC. States[] is gone in DWFC.
          • FlushSpecification is gone and DWFC returns DWPT upon checkoutForFlush. Yet, I still track the mem for the flushing DWPT seperatly since the DWPT#bytesUsed() changes during flush and I don't want to rely on that this doesn't change. As a nice side-effect I can check if a checked out DWPT is passed to doAfterFlush and assert on that.

          next steps here are benchmarking and getting good defaults for the flush policies. I think we are close though.

          Show
          Simon Willnauer added a comment - next iteration containing a large number of refactorings. I moved all responsibilities related to flushing including synchronization into the DocsWriterSession and renamed it to DocumentsWriterFlushControl. DWFC now only tracks active and flush bytes since the relict from my initial patch where pending memory was tracked is not needed anymore. DWFC took over all synchronization so there is not synchronized (flushControl) {...} in DocumentsWriter anymore. Seem way cleaner too though. Healthiness now blocks once we reach 2x maxMemory and SingleTierFlushPolicy uses 0.9 maxRam as low watermark and 2x low watermark as its HW to flush all threads. The multi tier one is still unchanged and flushes in linear steps from 0.9 to 1.10 x maxRam. We should actually test if this does better worse than the single tier FP. FlushPolicy now has only a visit method and uses the IW.message to write to info stream. ThreadState now holds a boolean flag that indicates if a flush is pending which is synced and written by DWFC. States[] is gone in DWFC. FlushSpecification is gone and DWFC returns DWPT upon checkoutForFlush. Yet, I still track the mem for the flushing DWPT seperatly since the DWPT#bytesUsed() changes during flush and I don't want to rely on that this doesn't change. As a nice side-effect I can check if a checked out DWPT is passed to doAfterFlush and assert on that. next steps here are benchmarking and getting good defaults for the flush policies. I think we are close though.
          Hide
          Michael McCandless added a comment -
          • I think once we sync up to trunk again, the FP should hold the
            IW's config instance, and pull settings "live" from it? Ie this
            way we keep our live changes to flush-by-RAM. Also, Healthiness
            (it won't get updates to RAM buffer now).
          • Should we rename *ByRAMFP --> *ByRAMOrDocCountFP? Since it "ors"
            docCount and RAM usage trigger right? Oh, I see, not quite – it
            requires RAM buffer be set. I think we should relax that? Ie a
            single flush policy (the default) flushes by either/or?
          • Shouldn't these flush policies also trigger by
            maxBufferedDelCount?
          • Maybe FP.init should throw IllegalStateExc not IllegalArgExc?
            (Because, no arg is allowed once the "state" of FP has already
            been init'ed).
          • Probably FP.writer should be a SetOnce?
          • Hmm we still have a FlushPolicy.message? Can't we just make IW
            protected and then FlushPolicy impl can call IW.message? (And
            also remove FP.setInfoStream).
          • Is IW.FlushControl not really used anymore? We should remove it?
          • I still think LW should be 1.0 of your RAM buffer. Ie, IW will
            start flushing once that much RAM is in use.
          • I still see "synchronized (docWriter.flushControl) {" in
            IndexWriter
          • We should jdoc that IWC.setFlushPolicy takes effect only on init
            of IW?
          • Add "for testing only" comment to IW.getDocsWriter?
          • I wonder whether we should convey "what changed" to the FP? EG,
            we can 1) buffer a new del term, 2) add a new doc, 3) both
            (updateDocument). It could be we have onUpdate, onAdd, onDelete?
            Or maybe we keep single method but rename to onChange? Ie, it's
            called because something about the incoming DWPT has changed.
          • The flush policy shouldn't have to compute "delta" RAM like it
            does now? Actually why can't it just call
            flushControl.activeBytes(), and we ensure the delta was already
            folded into that? Ie we'd call commmitPerThreadBytes before
            FP.visit. (Then commitPerThreadBytes wouldn't ever add to
            flushBytes, which is sort of spooky – like flushBytes should get
            incr'd only when we pull a DWPT out for flushing).
          • I don't think we should ever markAllWritersPending, ie, that's
            not the right "reaction" when flushing is too slow (eg you're on a
            slow hard drive) since over time this will result in flushing lots
            of tiny segments unnecessarily. A better reaction is to stall the
            incoming threads; this way the flusher threads catch up, and once
            you resume, then the small DPWTs have a chance to get big before
            they are flushed.
          • Misspelled: markLargesWriterPending -> markLargestWriterPending
          Show
          Michael McCandless added a comment - I think once we sync up to trunk again, the FP should hold the IW's config instance, and pull settings "live" from it? Ie this way we keep our live changes to flush-by-RAM. Also, Healthiness (it won't get updates to RAM buffer now). Should we rename *ByRAMFP --> *ByRAMOrDocCountFP? Since it "ors" docCount and RAM usage trigger right? Oh, I see, not quite – it requires RAM buffer be set. I think we should relax that? Ie a single flush policy (the default) flushes by either/or? Shouldn't these flush policies also trigger by maxBufferedDelCount? Maybe FP.init should throw IllegalStateExc not IllegalArgExc? (Because, no arg is allowed once the "state" of FP has already been init'ed). Probably FP.writer should be a SetOnce? Hmm we still have a FlushPolicy.message? Can't we just make IW protected and then FlushPolicy impl can call IW.message? (And also remove FP.setInfoStream). Is IW.FlushControl not really used anymore? We should remove it? I still think LW should be 1.0 of your RAM buffer. Ie, IW will start flushing once that much RAM is in use. I still see "synchronized (docWriter.flushControl) {" in IndexWriter We should jdoc that IWC.setFlushPolicy takes effect only on init of IW? Add "for testing only" comment to IW.getDocsWriter? I wonder whether we should convey "what changed" to the FP? EG, we can 1) buffer a new del term, 2) add a new doc, 3) both (updateDocument). It could be we have onUpdate, onAdd, onDelete? Or maybe we keep single method but rename to onChange? Ie, it's called because something about the incoming DWPT has changed. The flush policy shouldn't have to compute "delta" RAM like it does now? Actually why can't it just call flushControl.activeBytes(), and we ensure the delta was already folded into that? Ie we'd call commmitPerThreadBytes before FP.visit. (Then commitPerThreadBytes wouldn't ever add to flushBytes, which is sort of spooky – like flushBytes should get incr'd only when we pull a DWPT out for flushing). I don't think we should ever markAllWritersPending, ie, that's not the right "reaction" when flushing is too slow (eg you're on a slow hard drive) since over time this will result in flushing lots of tiny segments unnecessarily. A better reaction is to stall the incoming threads; this way the flusher threads catch up, and once you resume, then the small DPWTs have a chance to get big before they are flushed. Misspelled: markLargesWriterPending -> markLargestWriterPending
          Hide
          Simon Willnauer added a comment -

          here is my current state on this issue. I did't add all JDocs needed (by far) and I will wait until we settled on the API for FlushPolicy.

          • I removed the complex TieredFlushPolicy entirely and added one DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT pending.
          • DW will stall threads if we reach 2 x maxNetRam which is retrieved from FlushPolicy so folks can lower that depending on their env.
          • DWFlushControl checks if a single DWPT grows too large and sets it forcefully pending once its ram consumption is > 1.9 GB. That should be enough buffer to not reach the 2048MB limit. We should consider making this configurable.
          • FlushPolicy has now three methods onInsert, onUpdate and onDelete while DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base class just calls those on an update.
          • I removed FlushControl from IW
          • added documentation on IWC for FlushPolicy and removed the jdocs for the RAM limit. I think we should add some lines about how RAM is now used and that users should balance the RAM with the number of threads they are using. Will do that later on though.
          • For testing I added a ThrottledIndexOutput that makes flushing slow so I can test if we are stalled and / or blocked. This is passed to MockDirectoryWrapper. Its currently under util but it rather should go under store, no?
          • byte consumption is now committed before FlushPolicy is called since we don't have the multitier flush which required that to reliably proceed across tier boundaries (not required but it was easier to test really). So FP doesn't need to take care of the delta
          • FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which causes some failures though. I added //nocommit & @Ignore to those tests.
          • this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy which I couldn't figure out why it is failing but it could be due to an old version of LUCENE-2881 on this branch. I will see if it still fails once we merged.
          • Healthiness now doesn't stall if we are not flushing on RAM consumption to ensure we don't lock in threads.

          over all this seems much closer now. I will start writing jdocs. Flush on buffered delete terms might need some tests and I should also write a more reliable test for Healthiness... current it relies on that the ThrottledIndexOutput is slowing down indexing enough to block which might not be true all the time. It didn't fail yet.

          Show
          Simon Willnauer added a comment - here is my current state on this issue. I did't add all JDocs needed (by far) and I will wait until we settled on the API for FlushPolicy. I removed the complex TieredFlushPolicy entirely and added one DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT pending. DW will stall threads if we reach 2 x maxNetRam which is retrieved from FlushPolicy so folks can lower that depending on their env. DWFlushControl checks if a single DWPT grows too large and sets it forcefully pending once its ram consumption is > 1.9 GB. That should be enough buffer to not reach the 2048MB limit. We should consider making this configurable. FlushPolicy has now three methods onInsert, onUpdate and onDelete while DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base class just calls those on an update. I removed FlushControl from IW added documentation on IWC for FlushPolicy and removed the jdocs for the RAM limit. I think we should add some lines about how RAM is now used and that users should balance the RAM with the number of threads they are using. Will do that later on though. For testing I added a ThrottledIndexOutput that makes flushing slow so I can test if we are stalled and / or blocked. This is passed to MockDirectoryWrapper. Its currently under util but it rather should go under store, no? byte consumption is now committed before FlushPolicy is called since we don't have the multitier flush which required that to reliably proceed across tier boundaries (not required but it was easier to test really). So FP doesn't need to take care of the delta FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which causes some failures though. I added //nocommit & @Ignore to those tests. this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy which I couldn't figure out why it is failing but it could be due to an old version of LUCENE-2881 on this branch. I will see if it still fails once we merged. Healthiness now doesn't stall if we are not flushing on RAM consumption to ensure we don't lock in threads. over all this seems much closer now. I will start writing jdocs. Flush on buffered delete terms might need some tests and I should also write a more reliable test for Healthiness... current it relies on that the ThrottledIndexOutput is slowing down indexing enough to block which might not be true all the time. It didn't fail yet.
          Hide
          Michael McCandless added a comment -

          Patch is looking better! I love how simple DefaultFP is now

          • 1900 * 1024 * 1024 is actually 1.86 GB; maybe just change comment
            to 1900 MB? Or we could really make the limit 1.9 GB (= 1945.6
            MB)
          • I think we should make the 1.9 GB changeable
            (setRAMPerThreadHardLimitMB?)
          • How come we lost 'assert !bufferedDeletesStream.any();' in
            IndexWriter.java?
          • Why default trackAllocations to true when ctor always sets it (in
            TermsHash.java)?
          • Can we not simply not invoke the FP if flushPending is already set
            for the given DWPT? (So that every FP need not check that).
          • In DefaultFP.onDelete – we shouldn't just return if numDocsInRAM
            is 0? Ie an app could open IW, delete 2M terms, close, and we
            need to flush several times due to RAM usage or del term count...
          • Maybe rename DefaultFP --> FlushByRAMOrCounts?
          • Won't the new do/while loop added to ThreadAffinityDWThreadPool
            run hot, if minThreadState is constantly null...? (Separately
            that source needs a header)
          • I love the ThrottledIndexOutput!
          • For the InterruptedException in ThrottledIndexOutput.sleep, we
            should rethrow w/ oal.util.ThreadInterruptedException (best
            practice... probably doesn't really matter here)
          • We should fix DefaultFlushPolicy to first pull the relevant config
            from IWC (eg maxBufferedDocs), then check if that config is -1 or
            not, etc., because IWC's config can be changed at any time (live)
            so we may read eg 10000 at first and then -1 the second time.

          Maybe, for stalling, instead of triggering by max RAM, we can take
          this simple approach: if the number of flushing DWPTs ever exceeds one
          plus number of active DWPTs, then we stall (and resume once it's below
          again).

          This approach would then work for flush-by-docCount policies too, and
          would still roughly equate to up to 2X RAM usage for flush-by.

          It's really odd that TestPersistentSDP fails now... this should be
          unrelated to the (admittedly, major) changes we're making here...

          Hmm.... deletes are actually tricky, because somehow the FlushPolicy
          needs access to the "global" deletes count (and also the to per-DWPT
          deletes count). If a given DWPT has 0 buffered docs, then indeed the
          buffered deletes in its pool doesn't matter. But, we do need to respect
          the buffered deletes in the global pool...

          Show
          Michael McCandless added a comment - Patch is looking better! I love how simple DefaultFP is now 1900 * 1024 * 1024 is actually 1.86 GB; maybe just change comment to 1900 MB? Or we could really make the limit 1.9 GB (= 1945.6 MB) I think we should make the 1.9 GB changeable (setRAMPerThreadHardLimitMB?) How come we lost 'assert !bufferedDeletesStream.any();' in IndexWriter.java? Why default trackAllocations to true when ctor always sets it (in TermsHash.java)? Can we not simply not invoke the FP if flushPending is already set for the given DWPT? (So that every FP need not check that). In DefaultFP.onDelete – we shouldn't just return if numDocsInRAM is 0? Ie an app could open IW, delete 2M terms, close, and we need to flush several times due to RAM usage or del term count... Maybe rename DefaultFP --> FlushByRAMOrCounts? Won't the new do/while loop added to ThreadAffinityDWThreadPool run hot, if minThreadState is constantly null...? (Separately that source needs a header) I love the ThrottledIndexOutput! For the InterruptedException in ThrottledIndexOutput.sleep, we should rethrow w/ oal.util.ThreadInterruptedException (best practice... probably doesn't really matter here) We should fix DefaultFlushPolicy to first pull the relevant config from IWC (eg maxBufferedDocs), then check if that config is -1 or not, etc., because IWC's config can be changed at any time (live) so we may read eg 10000 at first and then -1 the second time. Maybe, for stalling, instead of triggering by max RAM, we can take this simple approach: if the number of flushing DWPTs ever exceeds one plus number of active DWPTs, then we stall (and resume once it's below again). This approach would then work for flush-by-docCount policies too, and would still roughly equate to up to 2X RAM usage for flush-by. It's really odd that TestPersistentSDP fails now... this should be unrelated to the (admittedly, major) changes we're making here... Hmm.... deletes are actually tricky, because somehow the FlushPolicy needs access to the "global" deletes count (and also the to per-DWPT deletes count). If a given DWPT has 0 buffered docs, then indeed the buffered deletes in its pool doesn't matter. But, we do need to respect the buffered deletes in the global pool...
          Hide
          Simon Willnauer added a comment -

          How come we lost 'assert !bufferedDeletesStream.any();' inIndexWriter.java?

          so this is tricky here. Since we are flushing concurrently this could false fail. The same assertion is in bufferedDeletesStream.prune(segmentInfos); which is synced. But another thread could sneak in between the prune and the any() check updating / deleting a document this could false fail. Or do I miss something here?

          Maybe, for stalling, instead of triggering by max RAM, we can take

          this simple approach: if the number of flushing DWPTs ever exceeds one
          plus number of active DWPTs, then we stall (and resume once it's below
          again).

          Awesome idea mike! I will do that!

          We should fix DefaultFlushPolicy to first pull the relevant config
          from IWC (eg maxBufferedDocs), then check if that config is -1 or
          not, etc., because IWC's config can be changed at any time (live)
          so we may read eg 10000 at first and then -1 the second time.

          What you mean is we should check if we flush by Ram, DocCount etc. and only if so we check the live values for Ram, DocCount etc.?

          Hmm.... deletes are actually tricky, because somehow the FlushPolicy
          needs access to the "global" deletes count (and also the to per-DWPT
          deletes count). If a given DWPT has 0 buffered docs, then indeed the
          buffered deletes in its pool doesn't matter. But, we do need to respect
          the buffered deletes in the global pool...

          I think it does not make sense to check both the global count and the DWPT count against the same value. If we have a DWPT that exceeds it we also exceed globally or could it happen that a DWPT has more deletes than the global pool? Further if we observe the global pool and we exceed the limit do we flush all as written on the IWC documentation?

          once we sort this out I upload a new patch with javadoc etc for flush policy. we seem to be close here man!

          Show
          Simon Willnauer added a comment - How come we lost 'assert !bufferedDeletesStream.any();' inIndexWriter.java? so this is tricky here. Since we are flushing concurrently this could false fail. The same assertion is in bufferedDeletesStream.prune(segmentInfos); which is synced. But another thread could sneak in between the prune and the any() check updating / deleting a document this could false fail. Or do I miss something here? Maybe, for stalling, instead of triggering by max RAM, we can take this simple approach: if the number of flushing DWPTs ever exceeds one plus number of active DWPTs, then we stall (and resume once it's below again). Awesome idea mike! I will do that! We should fix DefaultFlushPolicy to first pull the relevant config from IWC (eg maxBufferedDocs), then check if that config is -1 or not, etc., because IWC's config can be changed at any time (live) so we may read eg 10000 at first and then -1 the second time. What you mean is we should check if we flush by Ram, DocCount etc. and only if so we check the live values for Ram, DocCount etc.? Hmm.... deletes are actually tricky, because somehow the FlushPolicy needs access to the "global" deletes count (and also the to per-DWPT deletes count). If a given DWPT has 0 buffered docs, then indeed the buffered deletes in its pool doesn't matter. But, we do need to respect the buffered deletes in the global pool... I think it does not make sense to check both the global count and the DWPT count against the same value. If we have a DWPT that exceeds it we also exceed globally or could it happen that a DWPT has more deletes than the global pool? Further if we observe the global pool and we exceed the limit do we flush all as written on the IWC documentation? once we sort this out I upload a new patch with javadoc etc for flush policy. we seem to be close here man!
          Hide
          Simon Willnauer added a comment -

          next iterations.

          • added JavaDoc to all classes and interfaces.
          • fixed the possible hot loop in DWPTThreadPool
          • changed stalling logic to block if more flushing DWPT are around than active ones.
          • check IWC setting on init and listen to live changes for those who have not been disabled.
          • made the hard per thread RAM limit configurable and added DEFAULT_RAM_PER_THREAD_HARD_LIMIT set to 1945MB
          • renamed DefaultFP --> FlushByRAMOrCounts
          • added setFlushDeletes to DWFlushControl that is checked each time we add a delete and if set we flush the global deletes.

          this seems somewhat close though. its time to benchmark it again.

          Show
          Simon Willnauer added a comment - next iterations. added JavaDoc to all classes and interfaces. fixed the possible hot loop in DWPTThreadPool changed stalling logic to block if more flushing DWPT are around than active ones. check IWC setting on init and listen to live changes for those who have not been disabled. made the hard per thread RAM limit configurable and added DEFAULT_RAM_PER_THREAD_HARD_LIMIT set to 1945MB renamed DefaultFP --> FlushByRAMOrCounts added setFlushDeletes to DWFlushControl that is checked each time we add a delete and if set we flush the global deletes. this seems somewhat close though. its time to benchmark it again.
          Hide
          Simon Willnauer added a comment -

          Today I merged with trunk and updated my patch. I fixed the testcases that had an @Ignore on them and run tests. 1 out of 5 tests fails on TestStressIndexing and TestNRTThreads which is due to the update issues which should be addressed in LUCENE-2956. All other tests pass including all RAM / NumDoc / BufferedDeleteTerms related tests. As a consequence I committed the current state of this issue to the RT branch too. I will keep this issue open for now.

          Show
          Simon Willnauer added a comment - Today I merged with trunk and updated my patch. I fixed the testcases that had an @Ignore on them and run tests. 1 out of 5 tests fails on TestStressIndexing and TestNRTThreads which is due to the update issues which should be addressed in LUCENE-2956 . All other tests pass including all RAM / NumDoc / BufferedDeleteTerms related tests. As a consequence I committed the current state of this issue to the RT branch too. I will keep this issue open for now.
          Hide
          Michael Busch added a comment -

          Thanks Simon!
          I'll work on LUCENE-2956 next.

          Show
          Michael Busch added a comment - Thanks Simon! I'll work on LUCENE-2956 next.
          Hide
          Simon Willnauer added a comment -

          I run a couple of benchmarks with interesting results the graph below show documents per second for the RT branch with DWPT yielding a very good IO/CPU utilization and overall throughput is much better than trunks.

          Yet, when we look at trunk the peak performance is much better on trunk than on DWPT. The reason for that I think is that we flush concurrently which takes at most one thread out of the loop, those are the little drops in docs/sec. This does not yet explain the reason for the constantly lower max indexing rate, I suspect that this is at least influenced due to the fact that flushing is very very CPU intensive. At the same time CMS might kick in way more often since we are writing more segments which are also smaller compared to trunk. Eventually, I need to run a profiler and see what is going on.

          Interesting is that beside the nice CPU utilization we also have an nearly perfect IO utilization. The graph below shows that we are consistently using IO to flush segments. the width of the bars show the time it took to flush a single DWPT, there is almost no overlap.

          Overall those are super results! Good job everybody!

          simon

          Show
          Simon Willnauer added a comment - I run a couple of benchmarks with interesting results the graph below show documents per second for the RT branch with DWPT yielding a very good IO/CPU utilization and overall throughput is much better than trunks. Yet, when we look at trunk the peak performance is much better on trunk than on DWPT. The reason for that I think is that we flush concurrently which takes at most one thread out of the loop, those are the little drops in docs/sec. This does not yet explain the reason for the constantly lower max indexing rate, I suspect that this is at least influenced due to the fact that flushing is very very CPU intensive. At the same time CMS might kick in way more often since we are writing more segments which are also smaller compared to trunk. Eventually, I need to run a profiler and see what is going on. Interesting is that beside the nice CPU utilization we also have an nearly perfect IO utilization. The graph below shows that we are consistently using IO to flush segments. the width of the bars show the time it took to flush a single DWPT, there is almost no overlap. Overall those are super results! Good job everybody! simon
          Hide
          Jason Rutherglen added a comment -

          influenced due to the fact that flushing is very very CPU intensive

          Do you think this is due mostly to the vint decoding? We're not interleaving postings on flush with this patch so the CPU consumption should be somewhat lower.

          At the same time CMS might kick in way more often since we are writing more segments which are also smaller compared to trunk

          This's probably the more likely case. In general, we may be able to default to a higher overall RAM buffer size, and perhaps there won't be degradation in indexing performance like there is with trunk? In the future with RT we could get fancy and selectively merge segments as we're flushing, if writing larger segments is important.

          I'd personally prefer to write out 1-2 GB segments, and limit the number of DWPTs to 2-3, mainly for servers that are concurrently indexing and searching (eg, the RT use case). I think the current default number of thread states is a bit high.

          Show
          Jason Rutherglen added a comment - influenced due to the fact that flushing is very very CPU intensive Do you think this is due mostly to the vint decoding? We're not interleaving postings on flush with this patch so the CPU consumption should be somewhat lower. At the same time CMS might kick in way more often since we are writing more segments which are also smaller compared to trunk This's probably the more likely case. In general, we may be able to default to a higher overall RAM buffer size, and perhaps there won't be degradation in indexing performance like there is with trunk? In the future with RT we could get fancy and selectively merge segments as we're flushing, if writing larger segments is important. I'd personally prefer to write out 1-2 GB segments, and limit the number of DWPTs to 2-3, mainly for servers that are concurrently indexing and searching (eg, the RT use case). I think the current default number of thread states is a bit high.
          Hide
          Michael Busch added a comment -

          Thanks, Simon, for running the benchmarks! Good results overall, even though it's puzzling why flushing would be CPU intensive.

          We should probably do some profiling to figure out where the time is spent. I can probably do that Sunday, but feel free to beat me

          Show
          Michael Busch added a comment - Thanks, Simon, for running the benchmarks! Good results overall, even though it's puzzling why flushing would be CPU intensive. We should probably do some profiling to figure out where the time is spent. I can probably do that Sunday, but feel free to beat me
          Hide
          Simon Willnauer added a comment -

          Thanks, Simon, for running the benchmarks! Good results overall, even though it's puzzling why flushing would be CPU intensive.

          well during flush we are encoding lots of VInts thats making it cpu intensive.

          I actually run the benchmark through a profiler and found out what the problem was with my benchmarks.
          When I indexed with DWPT my HDD was soo busy flushing segments concurrently that the read performance suffered and my indexing threads blocked on the line doc file where I read the records from. This explains the large amounts of spikes towards 0 doc/sec. The profiler also showed that we are waiting on ThreadState#lock() constantly with at least 3 threads. I changed the current behavior of the threadpool to not clear the thread bindings when I replace a DWPT for flushing an voila! we have comparable peak ingest rate.

          Note the difference DWPT indexes the documents in 6 min 15 seconds!

          Here we have 13 min 40 seconds! NICE!

          Show
          Simon Willnauer added a comment - Thanks, Simon, for running the benchmarks! Good results overall, even though it's puzzling why flushing would be CPU intensive. well during flush we are encoding lots of VInts thats making it cpu intensive. I actually run the benchmark through a profiler and found out what the problem was with my benchmarks. When I indexed with DWPT my HDD was soo busy flushing segments concurrently that the read performance suffered and my indexing threads blocked on the line doc file where I read the records from. This explains the large amounts of spikes towards 0 doc/sec. The profiler also showed that we are waiting on ThreadState#lock() constantly with at least 3 threads. I changed the current behavior of the threadpool to not clear the thread bindings when I replace a DWPT for flushing an voila! we have comparable peak ingest rate. Note the difference DWPT indexes the documents in 6 min 15 seconds! Here we have 13 min 40 seconds! NICE!
          Hide
          Michael Busch added a comment -

          Awesome speedup! Finally all this work shows great results!!

          What's surprising is that the merge time is lower with DWPT. How can that be, considering we're doing more merges?

          Show
          Michael Busch added a comment - Awesome speedup! Finally all this work shows great results!! What's surprising is that the merge time is lower with DWPT. How can that be, considering we're doing more merges?
          Hide
          Simon Willnauer added a comment -

          this is committed to branch reviews should go through LUCENE-3023

          Show
          Simon Willnauer added a comment - this is committed to branch reviews should go through LUCENE-3023

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Michael Busch
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development