Lucene - Core
  1. Lucene - Core
  2. LUCENE-3092

NRTCachingDirectory, to buffer small segments in a RAMDir

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.2, 4.0-ALPHA
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I created this simply Directory impl, whose goal is reduce IO
      contention in a frequent reopen NRT use case.

      The idea is, when reopening quickly, but not indexing that much
      content, you wind up with many small files created with time, that can
      possibly stress the IO system eg if merges, searching are also
      fighting for IO.

      So, NRTCachingDirectory puts these newly created files into a RAMDir,
      and only when they are merged into a too-large segment, does it then
      write-through to the real (delegate) directory.

      This lets you spend some RAM to reduce I0.

      1. LUCENE-3092.patch
        8 kB
        Michael McCandless
      2. LUCENE-3092-listener.patch
        12 kB
        Chris Male
      3. LUCENE-3092.patch
        16 kB
        Michael McCandless
      4. LUCENE-3092.patch
        18 kB
        Michael McCandless
      5. LUCENE-3092.patch
        16 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Patch.

          The patch add NRTCachingDir in core, but if/when we commit this I
          think we should put it in contrib/module instead. I think it's
          working correctly – commit works fine (it flushes all cached files to
          the real dir on sync), but it needs a test case and I'm not going to
          have time in the near future to do that so I wanted to open this issue
          to get it out there....

          Show
          Michael McCandless added a comment - Patch. The patch add NRTCachingDir in core, but if/when we commit this I think we should put it in contrib/module instead. I think it's working correctly – commit works fine (it flushes all cached files to the real dir on sync), but it needs a test case and I'm not going to have time in the near future to do that so I wanted to open this issue to get it out there....
          Hide
          Uwe Schindler added a comment -

          I don't completely understand how this works. Is the getMergeScheduler() method intended to be used to set the MergeScheduler on the IndexWriter this directory is opened for? Why only support CMS?

          Show
          Uwe Schindler added a comment - I don't completely understand how this works. Is the getMergeScheduler() method intended to be used to set the MergeScheduler on the IndexWriter this directory is opened for? Why only support CMS?
          Hide
          Uwe Schindler added a comment -

          Sorry false alarm, haven't seen the JavaDocs (it's a patch, java code is not highlighted...). Still not understanding how it works, maybe it's too late.

          Show
          Uwe Schindler added a comment - Sorry false alarm, haven't seen the JavaDocs (it's a patch, java code is not highlighted...). Still not understanding how it works, maybe it's too late.
          Hide
          Michael McCandless added a comment -

          Yeah you should ask this dir for the merge scheduler... we could also fix it to delegate to any other merge scheduler (eg that you pass in), I think.

          Basically the dir just needs some way to "know" when a merge kicks off, so it can associate the thread doing that merge w/ the size of the merge.

          Maybe there's a cleaner way

          Show
          Michael McCandless added a comment - Yeah you should ask this dir for the merge scheduler... we could also fix it to delegate to any other merge scheduler (eg that you pass in), I think. Basically the dir just needs some way to "know" when a merge kicks off, so it can associate the thread doing that merge w/ the size of the merge. Maybe there's a cleaner way
          Hide
          Chris Male added a comment -

          Having not looked into the patch, what about some sort of event listener idea? The Directory can register that its interested in listening to merge events and the event can be fired when a merge is started. I can see alot of extensions maybe wanting to track when merges happen.

          Show
          Chris Male added a comment - Having not looked into the patch, what about some sort of event listener idea? The Directory can register that its interested in listening to merge events and the event can be fired when a merge is started. I can see alot of extensions maybe wanting to track when merges happen.
          Hide
          Chris Male added a comment - - edited

          Patch which roughly does what I suggested. Just a proof-of-concept since everything is currently tied to CMS (when it should work with any MS).

          Introduces a MergeEvent and MergeListener. MergeEvents are fired by CMS before and after merge is done. NRTCachingDirectory implements MergeListener and does it stuff on firing of the Events.

          There are dangers with calling listeners in a finally block but as I say, just a POC.

          Show
          Chris Male added a comment - - edited Patch which roughly does what I suggested. Just a proof-of-concept since everything is currently tied to CMS (when it should work with any MS). Introduces a MergeEvent and MergeListener. MergeEvents are fired by CMS before and after merge is done. NRTCachingDirectory implements MergeListener and does it stuff on firing of the Events. There are dangers with calling listeners in a finally block but as I say, just a POC.
          Hide
          Uwe Schindler added a comment -

          Could we not simply use IOContext? When IOContext is used on opening IndexOutput, we know some "metadata" and can act accordingly.

          Show
          Uwe Schindler added a comment - Could we not simply use IOContext? When IOContext is used on opening IndexOutput, we know some "metadata" and can act accordingly.
          Hide
          Earwin Burrfoot added a comment -

          highfive Uwe was going to suggest the very same thing.
          IOContext can include expected size. NRTCD will do its magic, other dirs may prealloc.

          Show
          Earwin Burrfoot added a comment - highfive Uwe was going to suggest the very same thing. IOContext can include expected size. NRTCD will do its magic, other dirs may prealloc.
          Hide
          Chris Male added a comment -

          IOContext can include expected size.

          While I appreciate the idea, isn't that a little inflexible? Okay so this Directory needs to know size, but another implementation might need to know other details (at this stage I don't know what, just thinking long term). So then we expand IOContext again?

          For a Directory, using IOContext maybe makes sense, but wouldn't it be useful to be able to announce that a merge is going to happen, and let whatevers interested, in this case a Directory, do what it needs to do?

          Show
          Chris Male added a comment - IOContext can include expected size. While I appreciate the idea, isn't that a little inflexible? Okay so this Directory needs to know size, but another implementation might need to know other details (at this stage I don't know what, just thinking long term). So then we expand IOContext again? For a Directory, using IOContext maybe makes sense, but wouldn't it be useful to be able to announce that a merge is going to happen, and let whatevers interested, in this case a Directory, do what it needs to do?
          Hide
          Simon Willnauer added a comment -

          First, nice idea mike! The way you are collection merge info is very very scary man so I tend to agree that we could make good use of IOContext here but at the same time I think the merge event system chris is proposing is very much needed IMO. Stuff like FlushPolicy could take information about concurrent merges and hold of flushes for a little while if memory allows it etc. If we are not using it here I think we should open another issue for that.

          Show
          Simon Willnauer added a comment - First, nice idea mike! The way you are collection merge info is very very scary man so I tend to agree that we could make good use of IOContext here but at the same time I think the merge event system chris is proposing is very much needed IMO. Stuff like FlushPolicy could take information about concurrent merges and hold of flushes for a little while if memory allows it etc. If we are not using it here I think we should open another issue for that.
          Hide
          Earwin Burrfoot added a comment -

          Chris, I don't like the idea of expanding IOContext again and again, but this case seems in line with intended purporse - give Directory implementation hints as to what we're going to do with it.

          I don't like events either. They look fragile and binding them to threads is a WTF. With all our pausing/unpausing magic there's no guarantee merge will end on the same thread it started on.

          Stuff like FlushPolicy could take information about concurrent merges and hold of flushes for a little while if memory allows it etc.

          Coordinating access to shared resource (IO subsystem) with events is very awkward. Ok, your FlushPolicy receives events from MergePolicy and holds flushes during merge. Now, when a flush is in progress, should FlushPolicy notify MergePolicy so it can hold its merges?
          It goes downhill from there. What if FP and MP fire events simultaneously? What should other listeners do?

          Try looking at a bigger picture. Merges are not your problem. Neither are flushes. Your problem is that several threads try to take their dump on disk simultaneously (for whatever reason, you don't really care). So what we need is an arbitration mechanism for Directory writes. A mechanism located presumably @ Directory level (eg, we don't need to throttle anything when writing to RAMDir).

          One possible implementation is that we add a constructor parameter to FSDirectory specifying desired level of IO parallelism, and then it keeps track of its IndexOutputs and stalls writes selectively. We can also add 'expectedWriteSize' to IOContext, so the Directory may favor shorter writes over bigger ones. Instead of 'expectedWriteSize' we can use 'priority'.

          Show
          Earwin Burrfoot added a comment - Chris, I don't like the idea of expanding IOContext again and again, but this case seems in line with intended purporse - give Directory implementation hints as to what we're going to do with it. I don't like events either. They look fragile and binding them to threads is a WTF. With all our pausing/unpausing magic there's no guarantee merge will end on the same thread it started on. Stuff like FlushPolicy could take information about concurrent merges and hold of flushes for a little while if memory allows it etc. Coordinating access to shared resource (IO subsystem) with events is very awkward. Ok, your FlushPolicy receives events from MergePolicy and holds flushes during merge. Now, when a flush is in progress, should FlushPolicy notify MergePolicy so it can hold its merges? It goes downhill from there. What if FP and MP fire events simultaneously? What should other listeners do? Try looking at a bigger picture. Merges are not your problem. Neither are flushes. Your problem is that several threads try to take their dump on disk simultaneously (for whatever reason, you don't really care). So what we need is an arbitration mechanism for Directory writes. A mechanism located presumably @ Directory level (eg, we don't need to throttle anything when writing to RAMDir). One possible implementation is that we add a constructor parameter to FSDirectory specifying desired level of IO parallelism, and then it keeps track of its IndexOutputs and stalls writes selectively. We can also add 'expectedWriteSize' to IOContext, so the Directory may favor shorter writes over bigger ones. Instead of 'expectedWriteSize' we can use 'priority'.
          Hide
          Chris Male added a comment -

          I don't like events either. They look fragile and binding them to threads is a WTF. With all our pausing/unpausing magic there's no guarantee merge will end on the same thread it started on.

          You're absolutely right that tying them to threads is not a good idea, but I couldn't disagree more that this is an issue with an Event model. Yes we should be wary of what information we make available in Events (I only added Threads to the Events in my patch because I was showing a simple alternative to Mike's initial impl). But to say Events are fragile and therefore we shouldn't pursue a listener pattern seems a big leap. Many large applications use event listeners and I don't think they're wrong.

          It goes downhill from there. What if FP and MP fire events simultaneously? What should other listeners do?

          Good point. Again, those are issues we need to resolve (one way applications tend to address this is event queues).

          Try looking at a bigger picture. Merges are not your problem. Neither are flushes.

          You're right, neither merges nor flushes are the problem. We have a series of components in Lucene; Directories, IndexWriter, MergeScheduler etc, and we have some crosscutting concerns such as merges themselves. We should strive to decouple these components where possible and think big picture.

          I agree with your suggestions for how we can solve the problem here now for NRTCachingDirectory, but I think we can do better for the longer term.

          Show
          Chris Male added a comment - I don't like events either. They look fragile and binding them to threads is a WTF. With all our pausing/unpausing magic there's no guarantee merge will end on the same thread it started on. You're absolutely right that tying them to threads is not a good idea, but I couldn't disagree more that this is an issue with an Event model. Yes we should be wary of what information we make available in Events (I only added Threads to the Events in my patch because I was showing a simple alternative to Mike's initial impl). But to say Events are fragile and therefore we shouldn't pursue a listener pattern seems a big leap. Many large applications use event listeners and I don't think they're wrong. It goes downhill from there. What if FP and MP fire events simultaneously? What should other listeners do? Good point. Again, those are issues we need to resolve (one way applications tend to address this is event queues). Try looking at a bigger picture. Merges are not your problem. Neither are flushes. You're right, neither merges nor flushes are the problem. We have a series of components in Lucene; Directories, IndexWriter, MergeScheduler etc, and we have some crosscutting concerns such as merges themselves. We should strive to decouple these components where possible and think big picture. I agree with your suggestions for how we can solve the problem here now for NRTCachingDirectory, but I think we can do better for the longer term.
          Hide
          Earwin Burrfoot added a comment -

          but I couldn't disagree more that this is an issue with an Event model

          There are no issues with event model itself. It's just that this model is badly suitable for this issue's usecase.
          Event listeners are good. Using them to emulate what is essentially a mutex - is ugly and fragile as hell.

          We have a series of components in Lucene; Directories, IndexWriter, MergeScheduler etc, and we have some crosscutting concerns such as merges themselves.

          My point is that for many concerns they shouldn't necessarily be crosscutting.
          Eg - Directory can support IO priorities/throttling, so it doesn't have to know about merges or flushes.
          Many OSes have have special APIs that allow IO prioritization, do they know about merges, or Lucene at all? No.

          Show
          Earwin Burrfoot added a comment - but I couldn't disagree more that this is an issue with an Event model There are no issues with event model itself. It's just that this model is badly suitable for this issue's usecase. Event listeners are good. Using them to emulate what is essentially a mutex - is ugly and fragile as hell. We have a series of components in Lucene; Directories, IndexWriter, MergeScheduler etc, and we have some crosscutting concerns such as merges themselves. My point is that for many concerns they shouldn't necessarily be crosscutting. Eg - Directory can support IO priorities/throttling, so it doesn't have to know about merges or flushes. Many OSes have have special APIs that allow IO prioritization, do they know about merges, or Lucene at all? No.
          Hide
          Michael McCandless added a comment -

          I agree it's scary/sneaky how this Dir impl associates threads w/
          merge sizes... using IOCtx or events or something else would be
          better.

          IOCtx would be a nice fit for this... because then we take all thread
          association out of the picture. The IOCtx should reference the
          OneMerge (if in fact this file is being opened because of a merge)?
          Then this impl can look at that to make its decision, and there's no
          more coupling of this dir iml to the merge scheduler.

          Should we open a separate issue for the events listener? Actually
          there may already be an issue open...?

          With all our pausing/unpausing magic there's no guarantee merge will end on the same thread it started on.

          The hard thread scheduling (forced pausing) that CMS does won't change
          the thread that runs the merge. That thread will forever run that
          merge (perhaps sometimes being paused) until that merge is done. At
          least, this is how it's impl'd today...

          Show
          Michael McCandless added a comment - I agree it's scary/sneaky how this Dir impl associates threads w/ merge sizes... using IOCtx or events or something else would be better. IOCtx would be a nice fit for this... because then we take all thread association out of the picture. The IOCtx should reference the OneMerge (if in fact this file is being opened because of a merge)? Then this impl can look at that to make its decision, and there's no more coupling of this dir iml to the merge scheduler. Should we open a separate issue for the events listener? Actually there may already be an issue open...? With all our pausing/unpausing magic there's no guarantee merge will end on the same thread it started on. The hard thread scheduling (forced pausing) that CMS does won't change the thread that runs the merge. That thread will forever run that merge (perhaps sometimes being paused) until that merge is done. At least, this is how it's impl'd today...
          Hide
          Michael McCandless added a comment -

          a WTF

          I've never seen this (WTF) used as a noun but I like it!!

          Show
          Michael McCandless added a comment - a WTF I've never seen this (WTF) used as a noun but I like it!!
          Hide
          Earwin Burrfoot added a comment -

          The IOCtx should reference the OneMerge (if in fact this file is being opened because of a merge)?

          IOCtx should have a value 'expectedSize', or 'priority', or something similar.
          This does not introduce a transitive dependency of Directory from MergePolicy (to please you once more - a true WTF), and this allows to apply the same logic to flushes. Eg - all small flushes/merges go to cache, all big flushes/merges go straight to disk.

          Show
          Earwin Burrfoot added a comment - The IOCtx should reference the OneMerge (if in fact this file is being opened because of a merge)? IOCtx should have a value 'expectedSize', or 'priority', or something similar. This does not introduce a transitive dependency of Directory from MergePolicy (to please you once more - a true WTF), and this allows to apply the same logic to flushes. Eg - all small flushes/merges go to cache, all big flushes/merges go straight to disk.
          Hide
          Michael McCandless added a comment -

          IOCtx should have a value 'expectedSize', or 'priority', or something similar.
          This does not introduce a transitive dependency of Directory from MergePolicy (to please you once more - a true WTF),

          Ahh, good point. So, for this dir impl I want to say "if net seg size is < X MB, cache it in RAM", so I guess we could have something like "expectedSizeOfSegmentMB" (covers all files that will be flushed for this segment, hmm minus the doc stores) in the IOCtx.

          Show
          Michael McCandless added a comment - IOCtx should have a value 'expectedSize', or 'priority', or something similar. This does not introduce a transitive dependency of Directory from MergePolicy (to please you once more - a true WTF), Ahh, good point. So, for this dir impl I want to say "if net seg size is < X MB, cache it in RAM", so I guess we could have something like "expectedSizeOfSegmentMB" (covers all files that will be flushed for this segment, hmm minus the doc stores) in the IOCtx.
          Hide
          Robert Muir added a comment -

          just an idea: with issues like this that work (but hackishly) and are self contained, I'm
          not sure we should block them on some huge refactoring like IOContext if they are actually usable

          Of course I had this same opinion on LUCENE-2829, and gave up there, but I like the idea
          of adding things that have a little hackishness (but work) to our 3.x branch, especially
          if they can be experimental or contribs, and then refactoring for elegance in trunk.

          I don't think users should have to wait until 4.0 for features because we won't add them until we
          totally refactor the world... its ok to add them hackishly in 3.x if we are "fixing" the larger issues in trunk.

          As far as scariness factor, is is really worse than forcefully unmapping direct byte buffers with sun.* methods in MMapDirectory?!

          Show
          Robert Muir added a comment - just an idea: with issues like this that work (but hackishly) and are self contained, I'm not sure we should block them on some huge refactoring like IOContext if they are actually usable Of course I had this same opinion on LUCENE-2829 , and gave up there, but I like the idea of adding things that have a little hackishness (but work) to our 3.x branch, especially if they can be experimental or contribs, and then refactoring for elegance in trunk. I don't think users should have to wait until 4.0 for features because we won't add them until we totally refactor the world... its ok to add them hackishly in 3.x if we are "fixing" the larger issues in trunk. As far as scariness factor, is is really worse than forcefully unmapping direct byte buffers with sun.* methods in MMapDirectory?!
          Hide
          Michael McCandless added a comment -

          just an idea: with issues like this that work (but hackishly) and are self contained, I'm
          not sure we should block them on some huge refactoring like IOContext if they are actually usable

          +1, progress not perfection.

          I'll clean up the patch – add an example code fragment of how you use it, a test case (aside: it'd be nice if, somehow, we could randomly swap this into our tests... we'd need a newMergeScheduler() method that would tap into this Dir impl if it had been picked, but also, we'd have to get this contrib module on core's classpath...), and a comment saying "this class does spooking stuff tracking merges and threads".

          Show
          Michael McCandless added a comment - just an idea: with issues like this that work (but hackishly) and are self contained, I'm not sure we should block them on some huge refactoring like IOContext if they are actually usable +1, progress not perfection. I'll clean up the patch – add an example code fragment of how you use it, a test case (aside: it'd be nice if, somehow, we could randomly swap this into our tests... we'd need a newMergeScheduler() method that would tap into this Dir impl if it had been picked, but also, we'd have to get this contrib module on core's classpath...), and a comment saying "this class does spooking stuff tracking merges and threads".
          Hide
          Michael McCandless added a comment -

          New patch.

          I reduced the over-synchronized methods (hopefully not too much!), improved jdocs (added an example usage), added CHANGES entry, and added a test case.

          But: the test case currently fails, due to LUCENE-3100.

          Show
          Michael McCandless added a comment - New patch. I reduced the over-synchronized methods (hopefully not too much!), improved jdocs (added an example usage), added CHANGES entry, and added a test case. But: the test case currently fails, due to LUCENE-3100 .
          Hide
          Michael McCandless added a comment -

          Sorry last patch was wrong – this one should be right.

          Show
          Michael McCandless added a comment - Sorry last patch was wrong – this one should be right.
          Hide
          Simon Willnauer added a comment -

          mike I attached a patch to LUCENE-3100 and tested with the latest patch on this issue. The test randomly fails (after I close the IW in the test!) here is a trace:

          junit-sequential:
              [junit] Testsuite: org.apache.lucene.store.TestNRTCachingDirectory
              [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 5.16 sec
              [junit] 
              [junit] ------------- Standard Error -----------------
              [junit] NOTE: reproduce with: ant test -Dtestcase=TestNRTCachingDirectory -Dtestmethod=testNRTAndCommit -Dtests.seed=-753565914717395747:-1817581638532977526
              [junit] NOTE: test params are: codec=RandomCodecProvider: {docid=SimpleText, body=MockFixedIntBlock(blockSize=1993), title=Pulsing(freqCutoff=3), titleTokenized=MockSep, date=SimpleText}, locale=ar_AE, timezone=America/Santa_Isabel
              [junit] NOTE: all tests run in this JVM:
              [junit] [TestNRTCachingDirectory]
              [junit] NOTE: Mac OS X 10.6.7 x86_64/Apple Inc. 1.6.0_24 (64-bit)/cpus=2,threads=1,free=46213552,total=85000192
              [junit] ------------- ---------------- ---------------
              [junit] Testcase: testNRTAndCommit(org.apache.lucene.store.TestNRTCachingDirectory):	FAILED
              [junit] limit=12 actual=16
              [junit] junit.framework.AssertionFailedError: limit=12 actual=16
              [junit] 	at org.apache.lucene.index.RandomIndexWriter.doRandomOptimize(RandomIndexWriter.java:165)
              [junit] 	at org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:199)
              [junit] 	at org.apache.lucene.store.TestNRTCachingDirectory.testNRTAndCommit(TestNRTCachingDirectory.java:179)
              [junit] 	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
              [junit] 	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
              [junit] 
              [junit] 
              [junit] Test org.apache.lucene.store.TestNRTCachingDirectory FAILED
          
          Show
          Simon Willnauer added a comment - mike I attached a patch to LUCENE-3100 and tested with the latest patch on this issue. The test randomly fails (after I close the IW in the test!) here is a trace: junit-sequential: [junit] Testsuite: org.apache.lucene.store.TestNRTCachingDirectory [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 5.16 sec [junit] [junit] ------------- Standard Error ----------------- [junit] NOTE: reproduce with: ant test -Dtestcase=TestNRTCachingDirectory -Dtestmethod=testNRTAndCommit -Dtests.seed=-753565914717395747:-1817581638532977526 [junit] NOTE: test params are: codec=RandomCodecProvider: {docid=SimpleText, body=MockFixedIntBlock(blockSize=1993), title=Pulsing(freqCutoff=3), titleTokenized=MockSep, date=SimpleText}, locale=ar_AE, timezone=America/Santa_Isabel [junit] NOTE: all tests run in this JVM: [junit] [TestNRTCachingDirectory] [junit] NOTE: Mac OS X 10.6.7 x86_64/Apple Inc. 1.6.0_24 (64-bit)/cpus=2,threads=1,free=46213552,total=85000192 [junit] ------------- ---------------- --------------- [junit] Testcase: testNRTAndCommit(org.apache.lucene.store.TestNRTCachingDirectory): FAILED [junit] limit=12 actual=16 [junit] junit.framework.AssertionFailedError: limit=12 actual=16 [junit] at org.apache.lucene.index.RandomIndexWriter.doRandomOptimize(RandomIndexWriter.java:165) [junit] at org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:199) [junit] at org.apache.lucene.store.TestNRTCachingDirectory.testNRTAndCommit(TestNRTCachingDirectory.java:179) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282) [junit] at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211) [junit] [junit] [junit] Test org.apache.lucene.store.TestNRTCachingDirectory FAILED
          Hide
          Michael McCandless added a comment -

          New patch, fixes the issue Simon hit (was just a bug in the test – it was using a silly MergePolicy that ignored partial optimize).

          Test now passes w/ the patch from LUCENE-3100.

          I think this is ready to commit, after LUCENE-3100 is in.

          Show
          Michael McCandless added a comment - New patch, fixes the issue Simon hit (was just a bug in the test – it was using a silly MergePolicy that ignored partial optimize). Test now passes w/ the patch from LUCENE-3100 . I think this is ready to commit, after LUCENE-3100 is in.
          Hide
          Simon Willnauer added a comment -

          Mike I committed LUCENE-3100 you can go ahead

          Show
          Simon Willnauer added a comment - Mike I committed LUCENE-3100 you can go ahead
          Hide
          Michael McCandless added a comment -

          Thanks Simon; I'll commit soon...

          Show
          Michael McCandless added a comment - Thanks Simon; I'll commit soon...
          Hide
          David Smiley added a comment -

          This looks cool. Any performance measurements? Perhaps a forthcoming post on Mike's blog?

          Show
          David Smiley added a comment - This looks cool. Any performance measurements? Perhaps a forthcoming post on Mike's blog?
          Hide
          Michael McCandless added a comment -

          Alas I haven't had time to really dig into perf gains here... but I suspect on systems where IO is in contention (due to ongoing cold searching, or merging), and reopen rate is highish, that this should be a decent win since we don't burden the IO system with many tiny files.

          Show
          Michael McCandless added a comment - Alas I haven't had time to really dig into perf gains here... but I suspect on systems where IO is in contention (due to ongoing cold searching, or merging), and reopen rate is highish, that this should be a decent win since we don't burden the IO system with many tiny files.
          Hide
          Shai Erera added a comment -

          Mike, this is a great idea ! If there are any chances it will be released in 3.2, I think one of our NRT apps can make good use of it.

          Question - I see that NRTCD ctor takes a Directory. Is there any reason to pass RAMDir to NRTCD? I assume you use a Directory for any other Dir impls out there that may not sub-class e.g., FSDir, which is ok - so can we at least document that this Dir is not useful if you intend to pass RAMDir to it?

          Unless I am wrong and it is useful w/ RAMDir as well.

          Show
          Shai Erera added a comment - Mike, this is a great idea ! If there are any chances it will be released in 3.2, I think one of our NRT apps can make good use of it. Question - I see that NRTCD ctor takes a Directory. Is there any reason to pass RAMDir to NRTCD? I assume you use a Directory for any other Dir impls out there that may not sub-class e.g., FSDir, which is ok - so can we at least document that this Dir is not useful if you intend to pass RAMDir to it? Unless I am wrong and it is useful w/ RAMDir as well.
          Hide
          Michael McCandless added a comment -

          I committed it to 3.x as well so this will be in 3.2

          I can't think of any reason why you'd want to wrap another RAMDir with NRTCD....? We can fix the docs to state this. Can you work out the wording/patch? Or just go ahead and commit a fix

          Show
          Michael McCandless added a comment - I committed it to 3.x as well so this will be in 3.2 I can't think of any reason why you'd want to wrap another RAMDir with NRTCD....? We can fix the docs to state this. Can you work out the wording/patch? Or just go ahead and commit a fix
          Hide
          Yonik Seeley added a comment -

          I can't think of any reason why you'd want to wrap another RAMDir with NRTCD....?

          Tests? It's nice to have a test use a RAMDirectory for speed, but still follow the same code path as FSDirectory for debugging + orthogonality.
          AFAIK, most Solr tests use RAMDirectory by default. There's no benefit to restricting it, right?

          Show
          Yonik Seeley added a comment - I can't think of any reason why you'd want to wrap another RAMDir with NRTCD....? Tests? It's nice to have a test use a RAMDirectory for speed, but still follow the same code path as FSDirectory for debugging + orthogonality. AFAIK, most Solr tests use RAMDirectory by default. There's no benefit to restricting it, right?
          Hide
          Michael McCandless added a comment -

          That's a great point Yonik – in fact the TestNRTCachingDirectory already
          relies on this generic-ness (pulls a newDirectory() from LuceneTestCase).

          Show
          Michael McCandless added a comment - That's a great point Yonik – in fact the TestNRTCachingDirectory already relies on this generic-ness (pulls a newDirectory() from LuceneTestCase).
          Hide
          Robert Muir added a comment -

          Tests? It's nice to have a test use a RAMDirectory for speed, but still follow the same code path as FSDirectory for debugging + orthogonality.

          FWIW currently the lucene tests use a RAMDirectory 90% of the time (and something else the other 10%).
          We could adjust this... at the time I set it, it seemed to not slow the tests down that much but
          still give us a little more coverage.

          Show
          Robert Muir added a comment - Tests? It's nice to have a test use a RAMDirectory for speed, but still follow the same code path as FSDirectory for debugging + orthogonality. FWIW currently the lucene tests use a RAMDirectory 90% of the time (and something else the other 10%). We could adjust this... at the time I set it, it seemed to not slow the tests down that much but still give us a little more coverage.
          Hide
          Robert Muir added a comment -

          Bulk closing for 3.2

          Show
          Robert Muir added a comment - Bulk closing for 3.2

            People

            • Assignee:
              Unassigned
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development