Lucene - Core
  1. Lucene - Core
  2. LUCENE-6524

Create an IndexWriter from an already opened NRT or non-NRT reader

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I'd like to add a new ctor to IndexWriter, letting you start from an already
      opened NRT or non-NRT DirectoryReader. I think this is a long missing
      API in Lucene today, and we've talked in the past about different ways
      to fix it e.g. factoring out a shared reader pool between writer and reader.

      One use-case, which I hit in LUCENE-5376: if you have a read-only
      index, so you've opened a non-NRT DirectoryReader to search it, and
      then you want to "upgrade" to a read/write index, we don't handle that
      very gracefully now because you are forced to open 2X the
      SegmentReaders.

      But with this API, IW populates its reader pool with the incoming
      SegmentReaders so they are shared on any subsequent NRT reopens /
      segment merging / deletes applying, etc.

      Another (more expert) use case is allowing rollback to an NRT-point.
      Today, you can only rollback to a commit point (segments_N). But an
      NRT reader also reflects a valid "point in time" view of the index (it
      just doesn't have a segments_N file, and its ref'd files are not
      fsync'd), so with this change you can close your old writer, open a
      new one from this NRT point, and revert all changes that had been done
      after the NRT reader was opened from the old writer.

      1. LUCENE-6524.patch
        41 kB
        Michael McCandless
      2. LUCENE-6524.patch
        27 kB
        Michael McCandless
      3. LUCENE-6524.patch
        25 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Initial work-in-progress patch ... a few nocommits, but the core
        idea seems to work!

        This patch also includes the fixes from LUCENE-6523.

        I had to add a restriction for the "open from NRT reader" case: you
        can't do this if the last commit this NRT reader sees has been
        deleted, e.g. if the old IndexWriter had done a commit after the NRT
        reader was opened. In this case there is no starting commit for
        the new writer to fall back on, which makes things tricky ...

        Show
        Michael McCandless added a comment - Initial work-in-progress patch ... a few nocommits, but the core idea seems to work! This patch also includes the fixes from LUCENE-6523 . I had to add a restriction for the "open from NRT reader" case: you can't do this if the last commit this NRT reader sees has been deleted, e.g. if the old IndexWriter had done a commit after the NRT reader was opened. In this case there is no starting commit for the new writer to fall back on, which makes things tricky ...
        Hide
        Robert Muir added a comment -

        Another (more expert) use case is allowing rollback to an NRT-point.

        This really needs to be a new method if added at all.

        We should not mess with the semantics of commit/rollback.

        Show
        Robert Muir added a comment - Another (more expert) use case is allowing rollback to an NRT-point. This really needs to be a new method if added at all. We should not mess with the semantics of commit/rollback.
        Hide
        Michael McCandless added a comment -

        This really needs to be a new method if added at all.

        Wait: the only thing I'm adding here is a new constructor for IndexWriter. I'm not touching IndexWriter.rollback ...

        Maybe I should not have used the term "rollback" in that use case: it is a loaded term!

        Show
        Michael McCandless added a comment - This really needs to be a new method if added at all. Wait: the only thing I'm adding here is a new constructor for IndexWriter. I'm not touching IndexWriter.rollback ... Maybe I should not have used the term "rollback" in that use case: it is a loaded term!
        Hide
        Michael McCandless added a comment -

        New patch, folding in trunk changes, fixing all nocommits, beefing up the tests...

        Show
        Michael McCandless added a comment - New patch, folding in trunk changes, fixing all nocommits, beefing up the tests...
        Hide
        Michael McCandless added a comment -

        I think the last patch is ready!

        Show
        Michael McCandless added a comment - I think the last patch is ready!
        Hide
        Robert Muir added a comment -

        We don't need lock sleeping in a new ctor. There is nothing to be backwards compatible with. Please remove that!

        Show
        Robert Muir added a comment - We don't need lock sleeping in a new ctor. There is nothing to be backwards compatible with. Please remove that!
        Hide
        Robert Muir added a comment -
        public IndexWriter(DirectoryReader r, IndexWriterConfig conf) throws IOException {
            ...
            if (r instanceof StandardDirectoryReader == false) {
              throw new IllegalArgumentException("the provided DirectoryReader must be a StandardDirectoryReader but got " + r);
            }
        

        no

        Please change the parameter type instead.

        Show
        Robert Muir added a comment - public IndexWriter(DirectoryReader r, IndexWriterConfig conf) throws IOException { ... if (r instanceof StandardDirectoryReader == false ) { throw new IllegalArgumentException( "the provided DirectoryReader must be a StandardDirectoryReader but got " + r); } no Please change the parameter type instead.
        Hide
        Robert Muir added a comment -

        Please fix these typos:

        // Note that you cannot do this if the old writer had done a commiter after this NRT reader was opened.
        ...
        // NOTE: this is correct even for an NRT reader becuase
        ...
        

        Why is there no integration in randomindexwriter?

        Show
        Robert Muir added a comment - Please fix these typos: // Note that you cannot do this if the old writer had done a commiter after this NRT reader was opened. ... // NOTE: this is correct even for an NRT reader becuase ... Why is there no integration in randomindexwriter?
        Hide
        Robert Muir added a comment -

        Another (more expert) use case is allowing rollback to an NRT-point.
        Today, you can only rollback to a commit point (segments_N). But an
        NRT reader also reflects a valid "point in time" view of the index (it
        just doesn't have a segments_N file, and its ref'd files are not
        fsync'd), so with this change you can close your old writer, open a
        new one from this NRT point, and revert all changes that had been done
        after the NRT reader was opened from the old writer.

        I'm still completely hung up on this. I don't think we should offer features that require additional components like transaction logs to be safe, which seems what this is geared at? Its also confusing if you later do a real commit, will we fsync the right stuff? Is that really guaranteed or just a happenchance of how we removed the stale files map?

        Net/Net I'm +/- 0 on this patch. I won't block it but I think its the wrong direction.

        Show
        Robert Muir added a comment - Another (more expert) use case is allowing rollback to an NRT-point. Today, you can only rollback to a commit point (segments_N). But an NRT reader also reflects a valid "point in time" view of the index (it just doesn't have a segments_N file, and its ref'd files are not fsync'd), so with this change you can close your old writer, open a new one from this NRT point, and revert all changes that had been done after the NRT reader was opened from the old writer. I'm still completely hung up on this. I don't think we should offer features that require additional components like transaction logs to be safe, which seems what this is geared at? Its also confusing if you later do a real commit, will we fsync the right stuff? Is that really guaranteed or just a happenchance of how we removed the stale files map? Net/Net I'm +/- 0 on this patch. I won't block it but I think its the wrong direction.
        Hide
        Robert Muir added a comment -

        I am however, against the current implementation of a second IW ctor.

        We already have IWC taking a Commit, that is enough... please deduplicate this mess

        Show
        Robert Muir added a comment - I am however, against the current implementation of a second IW ctor. We already have IWC taking a Commit, that is enough... please deduplicate this mess
        Hide
        Robert Muir added a comment -

        Today, there is only one IW ctor. This has some advantages, at the disadvantage of a totally overengineered IWConfig instead.

        Please choose one, I do not want both.

        Having two ctors is incredibly expensive on this code. This feature is flat out not worth that cost.

        Show
        Robert Muir added a comment - Today, there is only one IW ctor. This has some advantages, at the disadvantage of a totally overengineered IWConfig instead. Please choose one, I do not want both. Having two ctors is incredibly expensive on this code. This feature is flat out not worth that cost.
        Hide
        Robert Muir added a comment -

        Today, NRT readers already return an abstraction of IndexCommit (via StandardDirectoryReader.getIndexCommit). And you can already pass IndexCommit to IndexWriterConfig.setIndexCommit to tell IW to start from there. So I think we just need to fix the NRT case?

        Maybe its as simple as adding some package-private methods to IndexCommit, so IW would "reuse" StandardDirectoryReader's ReaderCommit (somehow it would return itself). We might need to think about how to force an incRef() too up-front, because of reader pooling. It might be cleaner to move "get the current fieldinfos" to this thing as well.

        This would really clean up my concerns about the API and additional IW ctors. Hopefully it would be simpler...

        Show
        Robert Muir added a comment - Today, NRT readers already return an abstraction of IndexCommit (via StandardDirectoryReader.getIndexCommit). And you can already pass IndexCommit to IndexWriterConfig.setIndexCommit to tell IW to start from there. So I think we just need to fix the NRT case? Maybe its as simple as adding some package-private methods to IndexCommit, so IW would "reuse" StandardDirectoryReader's ReaderCommit (somehow it would return itself). We might need to think about how to force an incRef() too up-front, because of reader pooling. It might be cleaner to move "get the current fieldinfos" to this thing as well. This would really clean up my concerns about the API and additional IW ctors. Hopefully it would be simpler...
        Hide
        Michael McCandless added a comment -

        Thanks Rob, I'll explore that simpler API.

        Show
        Michael McCandless added a comment - Thanks Rob, I'll explore that simpler API.
        Hide
        Michael McCandless added a comment -

        New patch, cutting over to new package private IWC.getIndexCommit().getReader() API to pass the reader to IW on init.

        Show
        Michael McCandless added a comment - New patch, cutting over to new package private IWC.getIndexCommit().getReader() API to pass the reader to IW on init.
        Hide
        Robert Muir added a comment -

        I think this is much cleaner, thank you for reworking it.

        Show
        Robert Muir added a comment - I think this is much cleaner, thank you for reworking it.
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk close for 5.3.0 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development