Lucene - Core
  1. Lucene - Core
  2. LUCENE-2691

Consolidate Near Real Time and Reopen API semantics

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We should consolidate the IndexWriter.getReader and the IndexReader.reopen semantics, since most people are already using the IR.reopen() method, we should simply add::

      IR.reopen(IndexWriter)
      

      Initially, it could just call the IW.getReader(), but it probably should switch to just using package private methods for sharing the internals

      1. LUCENE-2691.patch
        16 kB
        Grant Ingersoll
      2. LUCENE-2691.patch
        17 kB
        Grant Ingersoll

        Activity

        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1
        Hide
        Michael McCandless added a comment -

        I'll backport...

        Show
        Michael McCandless added a comment - I'll backport...
        Hide
        Michael McCandless added a comment -

        We should backport this to 3.1?

        Show
        Michael McCandless added a comment - We should backport this to 3.1?
        Hide
        Earwin Burrfoot added a comment -

        You're still okay with an API that allows you to reopen IRs on different directories?

        Well, that's no good - we can catch this and throw an exc?

        I don't understand why should we bother with checking and throwing exceptions, when we can prevent such things from compiling at all.
        By using an API, that doesn't support reopening on anything different from original source.

        Really, there are two separate "things" open/reopen needs:

        That's not true. Take a look at my WriterBackedReader above (or DirectoryReader in trunk). It requires writer at least to call deleteUnusedFiles(), nrtIsCurrent().
        So you can't easily reopen between Directory-backed and Writer-backed readers without much switching and checking.

        r_ram.reload(); //Here we want to reload from the FSDirecotory?

        Use MMapDirectory? It's only a bit slower for searches, while not raping your GC on big indexes.
        Also check this out - https://gist.github.com/715617 , it is a RAMDirectory offspring that wraps any other given directory and basically does what you want (if I guessed right).
        It doesn't use blocking for files, so file size limit is 2Gb, but this can be easily fixed. On the up side - it reads file into memory only after the size is known (unlike RAMDir), which allows you to use huge precisely-sized blocks, lessening GC pressure.
        I used it for a long time, but then my indexes grew, heaps followed, VM exploded and I switched to MMapDirectory (with minor patches).

        What is missing is a "signal" from IR.reload() to RAMdirectory to slurp fresh information from FSDirecory?

        There is zero need for any such signal. If a reader requests non-existing file from RAMDirectory, it should check backing dir before throwing exception. If backing dir does have the file - it is loaded and opened.
        Why do you people love complicating things that much?

        Show
        Earwin Burrfoot added a comment - You're still okay with an API that allows you to reopen IRs on different directories? Well, that's no good - we can catch this and throw an exc? I don't understand why should we bother with checking and throwing exceptions, when we can prevent such things from compiling at all. By using an API, that doesn't support reopening on anything different from original source. Really, there are two separate "things" open/reopen needs: That's not true. Take a look at my WriterBackedReader above (or DirectoryReader in trunk). It requires writer at least to call deleteUnusedFiles(), nrtIsCurrent(). So you can't easily reopen between Directory-backed and Writer-backed readers without much switching and checking. r_ram.reload(); //Here we want to reload from the FSDirecotory? Use MMapDirectory? It's only a bit slower for searches, while not raping your GC on big indexes. Also check this out - https://gist.github.com/715617 , it is a RAMDirectory offspring that wraps any other given directory and basically does what you want (if I guessed right). It doesn't use blocking for files, so file size limit is 2Gb, but this can be easily fixed. On the up side - it reads file into memory only after the size is known (unlike RAMDir), which allows you to use huge precisely-sized blocks, lessening GC pressure. I used it for a long time, but then my indexes grew, heaps followed, VM exploded and I switched to MMapDirectory (with minor patches). What is missing is a "signal" from IR.reload() to RAMdirectory to slurp fresh information from FSDirecory? There is zero need for any such signal. If a reader requests non-existing file from RAMDirectory, it should check backing dir before throwing exception. If backing dir does have the file - it is loaded and opened. Why do you people love complicating things that much?
        Hide
        Eks Dev added a comment -

        strictly speaking not really this issue, but somehow fits into discussion about IR.reopen() semantics.

        What would be the best way to support the following case? (Imo, very often needed. )

        Directory d = FSDirectory.open("/path/to/somewhere"); // Nice
        d_ram = RAMDirectory(d);
        r_ram = IndexReader.open(d_ram);
        r_ram.reload(); //Here we want to reload from the FSDirecotory?

        Point being, reload need not only IR/IW semantics consideration, but Directory as well.

        Does this make sense?
        A solution that looks doable would be to have "DiskBoundRAMDirectory extends RAMDirectory", that would remain read-only and keep a reference to FSDirectory (or whatewer Directory).
        What is missing is a "signal" from IR.reload() to RAMdirectory to slurp fresh information from FSDirecory?

        Show
        Eks Dev added a comment - strictly speaking not really this issue, but somehow fits into discussion about IR.reopen() semantics. What would be the best way to support the following case? (Imo, very often needed. ) Directory d = FSDirectory.open("/path/to/somewhere"); // Nice d_ram = RAMDirectory(d); r_ram = IndexReader.open(d_ram); r_ram.reload(); //Here we want to reload from the FSDirecotory? Point being, reload need not only IR/IW semantics consideration, but Directory as well. Does this make sense? A solution that looks doable would be to have "DiskBoundRAMDirectory extends RAMDirectory", that would remain read-only and keep a reference to FSDirectory (or whatewer Directory). What is missing is a "signal" from IR.reload() to RAMdirectory to slurp fresh information from FSDirecory?
        Hide
        Michael McCandless added a comment -

        You're still okay with an API that allows you to reopen IRs on different directories?

        Well, that's no good - we can catch this and throw an exc?

        I think reopen should have no parameters, and IR.reopen(IC) moved elsewhere.

        But where would you move IR.reopen(IC)?

        I'd personally like to see a special subclass of DIR that is returned from IW.getReader(), that knows about specifics of being opened over an active writer.

        That sounds great!

        If we allow reopen to switch readers between modes, we're introducing yet more checking and switching around.
        Also we'll get irreversible circular dependency between readers and writers, which is also a sign of things going wrong.

        Really, there are two separate "things" open/reopen needs:

        1. The SegmentInfos – either by finding the latest & greatest in the
          Directory, latest & greatest via IW, or explicitly via an
          IndexCommit.
        1. A prior pool of readers so that anything needing a reader (IW
          doing a merge, IW applying deletes, app opening new readers, etc.)
          can re-use any already open sub readers.

        Maybe we can split these out.

        EG we've wanted for a while now to factor out the reader pool hidden
        inside IW. If we did this, and then allowed optionally passing a pool
        to IR.open, I think we could do away with IR.reopen entirely? We'd
        have variants of IR.open taking IW, Dir, IndexCommit, to say where the
        SegmentInfos come from, and then optionally also taking the explicit
        reader pool.

        Show
        Michael McCandless added a comment - You're still okay with an API that allows you to reopen IRs on different directories? Well, that's no good - we can catch this and throw an exc? I think reopen should have no parameters, and IR.reopen(IC) moved elsewhere. But where would you move IR.reopen(IC)? I'd personally like to see a special subclass of DIR that is returned from IW.getReader(), that knows about specifics of being opened over an active writer. That sounds great! If we allow reopen to switch readers between modes, we're introducing yet more checking and switching around. Also we'll get irreversible circular dependency between readers and writers, which is also a sign of things going wrong. Really, there are two separate "things" open/reopen needs: The SegmentInfos – either by finding the latest & greatest in the Directory, latest & greatest via IW, or explicitly via an IndexCommit. A prior pool of readers so that anything needing a reader (IW doing a merge, IW applying deletes, app opening new readers, etc.) can re-use any already open sub readers. Maybe we can split these out. EG we've wanted for a while now to factor out the reader pool hidden inside IW. If we did this, and then allowed optionally passing a pool to IR.open, I think we could do away with IR.reopen entirely? We'd have variants of IR.open taking IW, Dir, IndexCommit, to say where the SegmentInfos come from, and then optionally also taking the explicit reader pool.
        Hide
        Earwin Burrfoot added a comment -

        You're still okay with an API that allows you to reopen IRs on different directories?
        Let us introduce then, instead of bothering with APIs, a single class with method 'void doStuff(Object... args)' that, inside, has an elaborate logic validating and interpreting said arguments and then doing a fat switch to execute corresponding computations.

        I think reopen should have no parameters, and IR.reopen(IC) moved elsewhere.

        I'd personally like to see a special subclass of DIR that is returned from IW.getReader(), that knows about specifics of being opened over an active writer.
        Something along the lines of - https://gist.github.com/d5a197a001c24fccb1b5
        This also removes said specifics in form of of stupid if-checks and unnecessary fields from actual DIR. Modes should be done with subclasses, not switches, dammit!

        If we allow reopen to switch readers between modes, we're introducing yet more checking and switching around.
        Also we'll get irreversible circular dependency between readers and writers, which is also a sign of things going wrong.

        Show
        Earwin Burrfoot added a comment - You're still okay with an API that allows you to reopen IRs on different directories? Let us introduce then, instead of bothering with APIs, a single class with method 'void doStuff(Object... args)' that, inside, has an elaborate logic validating and interpreting said arguments and then doing a fat switch to execute corresponding computations. I think reopen should have no parameters, and IR.reopen(IC) moved elsewhere. I'd personally like to see a special subclass of DIR that is returned from IW.getReader(), that knows about specifics of being opened over an active writer. Something along the lines of - https://gist.github.com/d5a197a001c24fccb1b5 This also removes said specifics in form of of stupid if-checks and unnecessary fields from actual DIR. Modes should be done with subclasses, not switches, dammit! If we allow reopen to switch readers between modes, we're introducing yet more checking and switching around. Also we'll get irreversible circular dependency between readers and writers, which is also a sign of things going wrong.
        Hide
        Michael McCandless added a comment -

        OK, I agree – the analogy to reopen on an IndexCommit makes sense: you are simply reopening the reader on a different set of segments. So IW is just being used (abused) as a "proxy" for the "current live segments on the index".

        Show
        Michael McCandless added a comment - OK, I agree – the analogy to reopen on an IndexCommit makes sense: you are simply reopening the reader on a different set of segments. So IW is just being used (abused) as a "proxy" for the "current live segments on the index".
        Hide
        Grant Ingersoll added a comment -

        I don't have a problem with it, but I don't see how it is any more confusing that reopening on a IndexCommit. In my mind, the thing you are reopening is a set of segments. Unfortunately, we don't have a access in a clean way to get the IndexWriter's segment, so we pass in the IW. The whole point of IR.reopen is you are getting a new reader on an index. Presumably, the IW is on the same Directory, just like presumably the CommitPoint is on the same Directory.

        Show
        Grant Ingersoll added a comment - I don't have a problem with it, but I don't see how it is any more confusing that reopening on a IndexCommit. In my mind, the thing you are reopening is a set of segments. Unfortunately, we don't have a access in a clean way to get the IndexWriter's segment, so we pass in the IW. The whole point of IR.reopen is you are getting a new reader on an index. Presumably, the IW is on the same Directory, just like presumably the CommitPoint is on the same Directory.
        Hide
        Michael McCandless added a comment -

        I also think we shouldn't add a reopen(IW) here, since you can just IR.open(IW) and then call its .reopen(). It's confusing to IR.open(Dir) and then later IR.reopen(IW) on it.

        Show
        Michael McCandless added a comment - I also think we shouldn't add a reopen(IW) here, since you can just IR.open(IW) and then call its .reopen(). It's confusing to IR.open(Dir) and then later IR.reopen(IW) on it.
        Hide
        Michael McCandless added a comment -

        Reopen until we settle the API question...

        Show
        Michael McCandless added a comment - Reopen until we settle the API question...
        Hide
        Yonik Seeley added a comment -

        Again, how is that any different? You are getting back a new Reader and assigning it to the same variable name.

        You could defend BitSet.doSomethingFunky(IndexReader reader, IndexWriter writer) with that same logic
        It's not about functionality (this patch doesn't add any functionality), it's about API.

        It doesn't update the one you have.

        Earwin knows that - he even said "Due to snapshot nature of readers, you don't do update inplace and get a new instance instead".

        I also agree with the statement: "So 'open' - always gets you a brand new reader on a given target, while 'reopen' updates the one you have." It's a logical update of the reader, hence the added reopen() method makes no sense.

        Show
        Yonik Seeley added a comment - Again, how is that any different? You are getting back a new Reader and assigning it to the same variable name. You could defend BitSet.doSomethingFunky(IndexReader reader, IndexWriter writer) with that same logic It's not about functionality (this patch doesn't add any functionality), it's about API. It doesn't update the one you have. Earwin knows that - he even said "Due to snapshot nature of readers, you don't do update inplace and get a new instance instead". I also agree with the statement: "So 'open' - always gets you a brand new reader on a given target, while 'reopen' updates the one you have." It's a logical update of the reader, hence the added reopen() method makes no sense.
        Hide
        Grant Ingersoll added a comment -

        So 'open' - always gets you a brand new reader on a given target, while 'reopen' updates the one you have.

        It doesn't update the one you have. You do get a new instance and you have to properly close the old one (see http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexReader.html#reopen%28%29). It is not just "updating" the internal memory of the old one, it is instead transferring that internal memory to a new instance and combining it with any changed segments. To me, that's what the IW case feels like too.

        Show
        Grant Ingersoll added a comment - So 'open' - always gets you a brand new reader on a given target, while 'reopen' updates the one you have. It doesn't update the one you have. You do get a new instance and you have to properly close the old one (see http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/IndexReader.html#reopen%28%29 ). It is not just "updating" the internal memory of the old one, it is instead transferring that internal memory to a new instance and combining it with any changed segments. To me, that's what the IW case feels like too.
        Hide
        Earwin Burrfoot added a comment -

        As I said above, though, I'm fine w/ dropping the reopen one and just keeping open(IW).

        Let's do this.

        In my ideal world the semantics of each method is encoded in its name.
        So 'open' - always gets you a brand new reader on a given target, while 'reopen' updates the one you have.
        Due to snapshot nature of readers, you don't do update inplace and get a new instance instead, but that doesn't make these two methods' semantics the same.

        Show
        Earwin Burrfoot added a comment - As I said above, though, I'm fine w/ dropping the reopen one and just keeping open(IW). Let's do this. In my ideal world the semantics of each method is encoded in its name. So 'open' - always gets you a brand new reader on a given target, while 'reopen' updates the one you have. Due to snapshot nature of readers, you don't do update inplace and get a new instance instead, but that doesn't make these two methods' semantics the same.
        Hide
        Grant Ingersoll added a comment -

        The semantics of reopen is you get a new Reader. If an optimize was done between open and reopen of the IR, you are getting an all new Reader. Should that get rid of reopen, too?

        As I said above, though, I'm fine w/ dropping the reopen one and just keeping open(IW). I care more about the fact that it is non-intuitive to ask a Writer for a Reader and this silly notion we have right now that if you want "fast reopen" you should reopen off of the IR, but if you want "really fast reopen" you should open off the Writer. In fact, maybe we should just drop reopen all together and simply have all points go through the open() method, by adding: open(IndexReader)

        Show
        Grant Ingersoll added a comment - The semantics of reopen is you get a new Reader. If an optimize was done between open and reopen of the IR, you are getting an all new Reader. Should that get rid of reopen, too? As I said above, though, I'm fine w/ dropping the reopen one and just keeping open(IW). I care more about the fact that it is non-intuitive to ask a Writer for a Reader and this silly notion we have right now that if you want "fast reopen" you should reopen off of the IR, but if you want "really fast reopen" you should open off the Writer. In fact, maybe we should just drop reopen all together and simply have all points go through the open() method, by adding: open(IndexReader)
        Hide
        Grant Ingersoll added a comment - - edited

        Again, how is that any different? You are getting back a new Reader and assigning it to the same variable name.

        Directory d = Directory.open("/path/to/somewhere"); // Nice
        IndexReader r = IndexReader.open(d); // Nice
        
        Directory d2 = new RAMDirectory(); // Okay
        IndexWriter w = new IndexWriter(d2); // Sure
        
        r = w.getReader() // WTF Indeed!
        
        Show
        Grant Ingersoll added a comment - - edited Again, how is that any different? You are getting back a new Reader and assigning it to the same variable name. Directory d = Directory.open( "/path/to/somewhere" ); // Nice IndexReader r = IndexReader.open(d); // Nice Directory d2 = new RAMDirectory(); // Okay IndexWriter w = new IndexWriter(d2); // Sure r = w.getReader() // WTF Indeed!
        Hide
        Earwin Burrfoot added a comment -

        Okay, let's go at it with an example.

        Directory d = Directory.open("/path/to/somewhere"); // Nice
        IndexReader r = IndexReader.open(d); // Nice
        
        Directory d2 = new RAMDirectory(); // Okay
        IndexWriter w = new IndexWriter(d2); // Sure
        
        r = r.reopen(w) // WTF?!
        

        Parameterless reopen does not allow you to "rebase" IndexReader, an overload you introduced with this patch - allows this.
        I'm somewhat okay with IR.open(iw). Despite being questionable it is at least not frikkin' broken like ir.reopen(iw).

        Show
        Earwin Burrfoot added a comment - Okay, let's go at it with an example. Directory d = Directory.open( "/path/to/somewhere" ); // Nice IndexReader r = IndexReader.open(d); // Nice Directory d2 = new RAMDirectory(); // Okay IndexWriter w = new IndexWriter(d2); // Sure r = r.reopen(w) // WTF?! Parameterless reopen does not allow you to "rebase" IndexReader, an overload you introduced with this patch - allows this. I'm somewhat okay with IR.open(iw). Despite being questionable it is at least not frikkin' broken like ir.reopen(iw).
        Hide
        Grant Ingersoll added a comment -

        I guess, however, to assuage you guys, perhaps we could drop the reopen(IW) all together and just have IR.open(IW). That satisfies my semantic/explanation need. Although, I would suspect that over time, reopen(IW) could still make sense.

        Show
        Grant Ingersoll added a comment - I guess, however, to assuage you guys, perhaps we could drop the reopen(IW) all together and just have IR.open(IW). That satisfies my semantic/explanation need. Although, I would suspect that over time, reopen(IW) could still make sense.
        Hide
        Grant Ingersoll added a comment -

        The method you introduced makes it perfectly fine to open a reader on some directory, and then try reopening it on IW, and then on another IW. Later you're going to add a heap of guards on this method, and runtime exceptions if user does some pointless reopen sequence?

        I don't see that. All we did was move the call from IW to IR where the reopen method was already located. Nothing is preventing a user from doing exactly what you described above with the old way either.

        What was wrong with old parameterless reopen() call?

        Nothing. That's why it is still there. This is about user comprehension and usability of the API, not about some functionality. Having trained hundreds of people on this, it's confusing to people (just like it's confusing that a "Reader" can do deletes). Why would a Writer object give me a Reader? Why would a Reader do write functionality?

        if you got a reader from IW, it reopens by asking said IW for the segments.

        Yep, and if you (re)open a Reader with an IW you get a Reader based on those segments.

        Adding more ways to do the same thing in this case just seems to complicate the API.

        We didn't add any more ways to do this. The exact same number of ways exist now. All we've done is kept Reader semantics on the Reader and Writer semantics on the Writer.

        Show
        Grant Ingersoll added a comment - The method you introduced makes it perfectly fine to open a reader on some directory, and then try reopening it on IW, and then on another IW. Later you're going to add a heap of guards on this method, and runtime exceptions if user does some pointless reopen sequence? I don't see that. All we did was move the call from IW to IR where the reopen method was already located. Nothing is preventing a user from doing exactly what you described above with the old way either. What was wrong with old parameterless reopen() call? Nothing. That's why it is still there. This is about user comprehension and usability of the API, not about some functionality. Having trained hundreds of people on this, it's confusing to people (just like it's confusing that a "Reader" can do deletes). Why would a Writer object give me a Reader? Why would a Reader do write functionality? if you got a reader from IW, it reopens by asking said IW for the segments. Yep, and if you (re)open a Reader with an IW you get a Reader based on those segments. Adding more ways to do the same thing in this case just seems to complicate the API. We didn't add any more ways to do this. The exact same number of ways exist now. All we've done is kept Reader semantics on the Reader and Writer semantics on the Writer.
        Hide
        Yonik Seeley added a comment -

        I think I agree Earwin... it's not clear what functionality was added with this patch, just a vague notion of consolidating semantics. Adding more ways to do the same thing in this case just seems to complicate the API.

        Show
        Yonik Seeley added a comment - I think I agree Earwin... it's not clear what functionality was added with this patch, just a vague notion of consolidating semantics. Adding more ways to do the same thing in this case just seems to complicate the API.
        Hide
        Earwin Burrfoot added a comment -

        What's up with intentionally twisted APIs?

        The method you introduced makes it perfectly fine to open a reader on some directory, and then try reopening it on IW, and then on another IW. Later you're going to add a heap of guards on this method, and runtime exceptions if user does some pointless reopen sequence?

        What was wrong with old parameterless reopen() call?
        If you've got a reader over Directory, it reopens looking for new segments there, if you got a reader from IW, it reopens by asking said IW for the segments.

        Show
        Earwin Burrfoot added a comment - What's up with intentionally twisted APIs? The method you introduced makes it perfectly fine to open a reader on some directory, and then try reopening it on IW, and then on another IW. Later you're going to add a heap of guards on this method, and runtime exceptions if user does some pointless reopen sequence? What was wrong with old parameterless reopen() call? If you've got a reader over Directory, it reopens looking for new segments there, if you got a reader from IW, it reopens by asking said IW for the segments.
        Hide
        Grant Ingersoll added a comment -

        Committed revision 1006280.

        Show
        Grant Ingersoll added a comment - Committed revision 1006280.
        Hide
        Simon Willnauer added a comment -

        Ready to go.

        +1 looks good to me

        Show
        Simon Willnauer added a comment - Ready to go. +1 looks good to me
        Hide
        Grant Ingersoll added a comment -

        Ready to go.

        Show
        Grant Ingersoll added a comment - Ready to go.
        Hide
        Grant Ingersoll added a comment -

        Simon, I think you are right, as the necessary sync blocks are in the underlying getReader() call on IW. I just copied the signatures. I will put up a new patch.

        Show
        Grant Ingersoll added a comment - Simon, I think you are right, as the necessary sync blocks are in the underlying getReader() call on IW. I just copied the signatures. I will put up a new patch.
        Hide
        Simon Willnauer added a comment -

        Should be good to go, but may need a few doc improvements.

        I think docs are fine though at least what I read on IR but I wonder why we need to synchronize on the iIR here since this call only forwards to the given writer?

         public synchronized IndexReader reopen(IndexWriter writer) throws CorruptIndexException, IOException {
           return writer.getReader();
         }
        

        I would guess calling writer.getReader() in there should not block other actions like concurrent searches on this reader since a NRT getRearder call can take its time. If at all we should sync on the writer though (which is already done inside IW) or do I miss something?

        Show
        Simon Willnauer added a comment - Should be good to go, but may need a few doc improvements. I think docs are fine though at least what I read on IR but I wonder why we need to synchronize on the iIR here since this call only forwards to the given writer? public synchronized IndexReader reopen(IndexWriter writer) throws CorruptIndexException, IOException { return writer.getReader(); } I would guess calling writer.getReader() in there should not block other actions like concurrent searches on this reader since a NRT getRearder call can take its time. If at all we should sync on the writer though (which is already done inside IW) or do I miss something?
        Hide
        Michael McCandless added a comment -

        patch looks great Grant!

        I think we don't need the IR.open that takes a termInfosIndexDivisor... and we also don't need the IW.getReader that takes one too. We had added these before adding IWC.set/getReaderTermsIndexDivisor...

        Show
        Michael McCandless added a comment - patch looks great Grant! I think we don't need the IR.open that takes a termInfosIndexDivisor... and we also don't need the IW.getReader that takes one too. We had added these before adding IWC.set/getReaderTermsIndexDivisor...
        Hide
        Grant Ingersoll added a comment -

        Patch that makes the change. Adds open and reopen methods to IndexReader while making it package protected on IndexWriter.

        Should be good to go, but may need a few doc improvements.

        Show
        Grant Ingersoll added a comment - Patch that makes the change. Adds open and reopen methods to IndexReader while making it package protected on IndexWriter. Should be good to go, but may need a few doc improvements.
        Hide
        Michael McCandless added a comment -

        we'll have to add an IndexReader.open(IndexWriter) method (plus, I suppose a couple of variations).

        +1

        I think IR.open(IW) makes sense.

        Show
        Michael McCandless added a comment - we'll have to add an IndexReader.open(IndexWriter) method (plus, I suppose a couple of variations). +1 I think IR.open(IW) makes sense.
        Hide
        Michael McCandless added a comment -

        What's with all of the test dependencies on IndexWriter.getReader() that seemingly don't have anything to do with NRT? For instance, TestQueryParser.testPositionIncrements() or TestCachingSpanFilter?

        This is just because those tests need a reader having just built the index... ie, as long as there's no other reason to .commit, getting the reader from the writer is perfectly fine.

        Plus this only increases test coverage of NRT.

        RandomIndexWriter's .getReader method now randomly picks to either get the reader from the writer, or, to .commit and then open a new reader.

        Show
        Michael McCandless added a comment - What's with all of the test dependencies on IndexWriter.getReader() that seemingly don't have anything to do with NRT? For instance, TestQueryParser.testPositionIncrements() or TestCachingSpanFilter? This is just because those tests need a reader having just built the index... ie, as long as there's no other reason to .commit, getting the reader from the writer is perfectly fine. Plus this only increases test coverage of NRT. RandomIndexWriter's .getReader method now randomly picks to either get the reader from the writer, or, to .commit and then open a new reader.
        Hide
        Grant Ingersoll added a comment -

        What's with all of the test dependencies on IndexWriter.getReader() that seemingly don't have anything to do with NRT? For instance, TestQueryParser.testPositionIncrements() or TestCachingSpanFilter?

        If we insist on those, then it seems we'll have to add an IndexReader.open(IndexWriter) method (plus, I suppose a couple of variations). This concerns me b/c there are already more than enough variations on that method, but what's one more?

        Show
        Grant Ingersoll added a comment - What's with all of the test dependencies on IndexWriter.getReader() that seemingly don't have anything to do with NRT? For instance, TestQueryParser.testPositionIncrements() or TestCachingSpanFilter? If we insist on those, then it seems we'll have to add an IndexReader.open(IndexWriter) method (plus, I suppose a couple of variations). This concerns me b/c there are already more than enough variations on that method, but what's one more?
        Hide
        Michael McCandless added a comment -

        +1!

        Show
        Michael McCandless added a comment - +1!

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Grant Ingersoll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development