Lucene - Core
  1. Lucene - Core
  2. LUCENE-2386

IndexWriter commits unnecessarily on fresh Directory

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... why do we need that commit? Do we really expect people to open an IndexReader on an empty Directory which they just passed to an IW w/ create=true? If they want, they can simply call commit() right away on the IW they created.

      I ran into this when writing a test which committed N times, then compared the number of commits (via IndexReader.listCommits) and was surprised to see N+1 commits.

      Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter jumping on me .. so the change might not be that simple. But I think it's manageable, so I'll try to attack it (and IFD specifically !) back .

      1. ASF.LICENSE.NOT.GRANTED--LUCENE-2386.patch
        60 kB
        Shai Erera
      2. ASF.LICENSE.NOT.GRANTED--LUCENE-2386.patch
        59 kB
        Shai Erera
      3. ASF.LICENSE.NOT.GRANTED--LUCENE-2386.patch
        61 kB
        Shai Erera
      4. ASF.LICENSE.NOT.GRANTED--LUCENE-2386.patch
        55 kB
        Shai Erera
      5. LUCENE-2386.patch
        56 kB
        Shai Erera
      6. LUCENE-2386.patch
        7 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Took a look at IndexFileDeleter, and located to offending code segment which is responsible for the IndexCorruptException:

            if (currentCommitPoint == null) {
              // We did not in fact see the segments_N file
              // corresponding to the segmentInfos that was passed
              // in.  Yet, it must exist, because our caller holds
              // the write lock.  This can happen when the directory
              // listing was stale (eg when index accessed via NFS
              // client with stale directory listing cache).  So we
              // try now to explicitly open this commit point:
              SegmentInfos sis = new SegmentInfos();
              try {
                sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs);
              } catch (IOException e) {
                throw new CorruptIndexException("failed to locate current segments_N file");
              }
        

        Looks like this code protects against a real problem, which was raised on the list a couple of times already - stale NFS cache. So I'm reluctant to remove that check ... thought I still think we should differentiate between a newly created index on a fresh Directory, to a stale NFS problem. Maybe we can pass a boolean isNew or something like that to the ctor, and if it's a new index and the last commit point is missing, IFD will not throw the exception, but silently ignore that? So the code would become something like this:

            if (currentCommitPoint == null && !isNew) {
               ....
            }
        

        Does this make sense, or am I missing something?

        Show
        Shai Erera added a comment - Took a look at IndexFileDeleter, and located to offending code segment which is responsible for the IndexCorruptException: if (currentCommitPoint == null ) { // We did not in fact see the segments_N file // corresponding to the segmentInfos that was passed // in. Yet, it must exist, because our caller holds // the write lock. This can happen when the directory // listing was stale (eg when index accessed via NFS // client with stale directory listing cache). So we // try now to explicitly open this commit point: SegmentInfos sis = new SegmentInfos(); try { sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs); } catch (IOException e) { throw new CorruptIndexException( "failed to locate current segments_N file" ); } Looks like this code protects against a real problem, which was raised on the list a couple of times already - stale NFS cache. So I'm reluctant to remove that check ... thought I still think we should differentiate between a newly created index on a fresh Directory, to a stale NFS problem. Maybe we can pass a boolean isNew or something like that to the ctor, and if it's a new index and the last commit point is missing, IFD will not throw the exception, but silently ignore that? So the code would become something like this: if (currentCommitPoint == null && !isNew) { .... } Does this make sense, or am I missing something?
        Hide
        Michael McCandless added a comment -

        I agree: IW really should not commit the first segments_1, for CREATE when Dir has no index already. App should immediately .commit() if it really wants to.

        We should fix IFD to know if it's dealing with a "known new" index and bypass that check that works around stale NFS dir listing (boolean arg sounds good).

        Show
        Michael McCandless added a comment - I agree: IW really should not commit the first segments_1, for CREATE when Dir has no index already. App should immediately .commit() if it really wants to. We should fix IFD to know if it's dealing with a "known new" index and bypass that check that works around stale NFS dir listing (boolean arg sounds good).
        Hide
        Shai Erera added a comment -

        Looking at IFD again, I think a boolean ctor arg is not required. What I can do is check if any Lucene file has been seen (in the for-loop iteration on the Directory files), and if not, then deduce it's a new Directory, and skip that 'if' check. I'll give it a shot.

        Show
        Shai Erera added a comment - Looking at IFD again, I think a boolean ctor arg is not required. What I can do is check if any Lucene file has been seen (in the for-loop iteration on the Directory files), and if not, then deduce it's a new Directory, and skip that 'if' check. I'll give it a shot.
        Hide
        Shai Erera added a comment -

        First stab at this. Patch still missing CHANGES entry, and I haven't run all the tests, just TestIndexWriter. With those changes it passes. One thing that I think should be fixed is testImmediateDiskFull - if I don't add writer.commit(), the test fails, because dir.getRecomputeActualSizeInBytes returns 0 (no RAMFiles yet), and then the test succeeds at adding one document. So maybe just change the test to set maxSizeInBytes to '1', always?

        TestNoDeletionPolicy is not covered by this patch (should be fixed as well, because now the number of commits is exactly N and not N+1). Will fix it tomorrow.

        Anyway, it's really late now, so hopefully some fresh eyes will look at it while I'm away, and comment on the proposed changes. I hope I got all the changes to the tests right.

        Show
        Shai Erera added a comment - First stab at this. Patch still missing CHANGES entry, and I haven't run all the tests, just TestIndexWriter. With those changes it passes. One thing that I think should be fixed is testImmediateDiskFull - if I don't add writer.commit(), the test fails, because dir.getRecomputeActualSizeInBytes returns 0 (no RAMFiles yet), and then the test succeeds at adding one document. So maybe just change the test to set maxSizeInBytes to '1', always? TestNoDeletionPolicy is not covered by this patch (should be fixed as well, because now the number of commits is exactly N and not N+1). Will fix it tomorrow. Anyway, it's really late now, so hopefully some fresh eyes will look at it while I'm away, and comment on the proposed changes. I hope I got all the changes to the tests right.
        Hide
        Michael McCandless added a comment -

        I think the patch is good Shai. I'd be curious what other tests rely on an immediate commit on creating an index....

        Maybe change testImmediateDiskFull to set max allowed size to max(1, current-usage)? In case we change IW to write other stuff in the future on create...

        Show
        Michael McCandless added a comment - I think the patch is good Shai. I'd be curious what other tests rely on an immediate commit on creating an index.... Maybe change testImmediateDiskFull to set max allowed size to max(1, current-usage)? In case we change IW to write other stuff in the future on create...
        Hide
        Shai Erera added a comment -

        Maybe change testImmediateDiskFull to set max allowed size to max(1, current-usage)?

        Good idea ! Did it and it works.

        Now ... one thing I haven't mentioned is the bw break. This is a behavioral bw break, which specifically I'm not so sure we should care about, because I wonder how many apps out there rely on being able to open a reader before they ever commited on a fresh new index. So what do you think - do this change anyway, OR ... utilize Version to our aid? I.e., if the Version that was passed to IWC is before LUCENE_31, we keep the initial commit, otherwise we don't do it? Pros is that I won't need to change many of the tests because they still use the LUCENE_30 version (but that is not a strong argument), so it's a weak Pro. Cons is that IW will keep having that doCommit handling in its ctor, only now w/ added comments on why this is being kept around etc.

        What do you think?

        Show
        Shai Erera added a comment - Maybe change testImmediateDiskFull to set max allowed size to max(1, current-usage)? Good idea ! Did it and it works. Now ... one thing I haven't mentioned is the bw break. This is a behavioral bw break, which specifically I'm not so sure we should care about, because I wonder how many apps out there rely on being able to open a reader before they ever commited on a fresh new index. So what do you think - do this change anyway, OR ... utilize Version to our aid? I.e., if the Version that was passed to IWC is before LUCENE_31, we keep the initial commit, otherwise we don't do it? Pros is that I won't need to change many of the tests because they still use the LUCENE_30 version (but that is not a strong argument), so it's a weak Pro. Cons is that IW will keep having that doCommit handling in its ctor, only now w/ added comments on why this is being kept around etc. What do you think?
        Hide
        Shai Erera added a comment -

        Apparently, there are more tests that fail ... lost count but easy fixing. I tried writing the following test:

          public void testNoCommits() throws Exception {
            // Tests that if we don't call commit(), the directory has 0 commits. This has
            // changed since LUCENE-2386, where before IW would always commit on a fresh
            // new index.
            Directory dir = new RAMDirectory();
            IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new WhitespaceAnalyzer(TEST_VERSION_CURRENT)));
            assertEquals("expected 0 commits!", 0, IndexReader.listCommits(dir).size());
            // No changes still should generate a commit, because it's a new index.
            writer.close();
            assertEquals("expected 1 commits!", 0, IndexReader.listCommits(dir).size());
          }
        

        Simple test - validates that no commits are present following a freshly new index creation, w/o closing or committing. However, IndexReader.listCommits fails w/ the following exception:

        java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@2d262d26: files: []
        	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:652)
        	at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535)
        	at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:323)
        	at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1033)
        	at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1023)
        	at org.apache.lucene.index.IndexReader.listCommits(IndexReader.java:1341)
        	at org.apache.lucene.index.TestIndexWriter.testNoCommits(TestIndexWriter.java:4966)
               ....
        

        The failure occurs when SegmentInfos attempts to find segments.gen and fails. So I wonder if I should fix DirectoryReader to catch that exception and simply return an empty Collection .. or I should fix SegmentInfos at this point – notice the "files: []" at the end - I think that by adding a check to the following code (SegmentInfos, line 652) which validates that there were any files before throwing the exception, it'll still work properly and safely (i.e. to detect a problematic Directory). Will need probably to break away from the while loop and I guess fix some other things in upper layers ... therefore I'm not sure if I should not simply catch that exception in DirectoryReader.listCommits w/ proper documentation and be done w/ it. After all, it's not supposed to be called ... ever? or hardly ever?

                  if (gen == -1) {
                    // Neither approach found a generation
                    throw new FileNotFoundException("no segments* file found in " + directory + ": files: " + Arrays.toString(files));
                  }
        
        Show
        Shai Erera added a comment - Apparently, there are more tests that fail ... lost count but easy fixing. I tried writing the following test: public void testNoCommits() throws Exception { // Tests that if we don't call commit(), the directory has 0 commits. This has // changed since LUCENE-2386, where before IW would always commit on a fresh // new index. Directory dir = new RAMDirectory(); IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(TEST_VERSION_CURRENT, new WhitespaceAnalyzer(TEST_VERSION_CURRENT))); assertEquals( "expected 0 commits!" , 0, IndexReader.listCommits(dir).size()); // No changes still should generate a commit, because it's a new index. writer.close(); assertEquals( "expected 1 commits!" , 0, IndexReader.listCommits(dir).size()); } Simple test - validates that no commits are present following a freshly new index creation, w/o closing or committing. However, IndexReader.listCommits fails w/ the following exception: java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@2d262d26: files: [] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:652) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:323) at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1033) at org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1023) at org.apache.lucene.index.IndexReader.listCommits(IndexReader.java:1341) at org.apache.lucene.index.TestIndexWriter.testNoCommits(TestIndexWriter.java:4966) .... The failure occurs when SegmentInfos attempts to find segments.gen and fails. So I wonder if I should fix DirectoryReader to catch that exception and simply return an empty Collection .. or I should fix SegmentInfos at this point – notice the "files: []" at the end - I think that by adding a check to the following code (SegmentInfos, line 652) which validates that there were any files before throwing the exception, it'll still work properly and safely (i.e. to detect a problematic Directory). Will need probably to break away from the while loop and I guess fix some other things in upper layers ... therefore I'm not sure if I should not simply catch that exception in DirectoryReader.listCommits w/ proper documentation and be done w/ it. After all, it's not supposed to be called ... ever? or hardly ever? if (gen == -1) { // Neither approach found a generation throw new FileNotFoundException( "no segments* file found in " + directory + ": files: " + Arrays.toString(files)); }
        Hide
        Michael McCandless added a comment -

        This is a behavioral bw break, which specifically I'm not so sure we should care about, because I wonder how many apps out there rely on being able to open a reader before they ever commited on a fresh new index.

        I'm inclined to just break it (w/o a version switch) – really this is almost a bug, in that autoCommit is false and so IW should make no commits to the index unless you ask it to.

        We should try to contain the amount of switching we do based on Version.XXX, only using it to match behavior in cases where it can do alot of harm (eg the change alters what's indexed). This change doesn't fit that, ie, all the app has to do (if it really cares) is call IW.commit() immediately on creating the index.

        However, IndexReader.listCommits fails w/ the following exception

        I think we should catch the exception in DirReader.listCommits (where it tries to load latest) and simply return empty collection?

        Show
        Michael McCandless added a comment - This is a behavioral bw break, which specifically I'm not so sure we should care about, because I wonder how many apps out there rely on being able to open a reader before they ever commited on a fresh new index. I'm inclined to just break it (w/o a version switch) – really this is almost a bug, in that autoCommit is false and so IW should make no commits to the index unless you ask it to. We should try to contain the amount of switching we do based on Version.XXX, only using it to match behavior in cases where it can do alot of harm (eg the change alters what's indexed). This change doesn't fit that, ie, all the app has to do (if it really cares) is call IW.commit() immediately on creating the index. However, IndexReader.listCommits fails w/ the following exception I think we should catch the exception in DirReader.listCommits (where it tries to load latest) and simply return empty collection?
        Hide
        Shai Erera added a comment -

        Ok I've added the following to DirReader:

            try {
              latest.read(dir, codecs);
            } catch (FileNotFoundException e) {
              if (e.getMessage().startsWith("no segments* file found in")) {
                // Might be that the Directory is empty, in which case just return an
                // empty collection.
                return Collections.emptyList();
              } else {
                throw e;
              }
            }
        

        And now that test passes.

        I'll continue discovering tests that fail ... probably backwards will have its share too .

        Show
        Shai Erera added a comment - Ok I've added the following to DirReader: try { latest.read(dir, codecs); } catch (FileNotFoundException e) { if (e.getMessage().startsWith( "no segments* file found in" )) { // Might be that the Directory is empty, in which case just return an // empty collection. return Collections.emptyList(); } else { throw e; } } And now that test passes. I'll continue discovering tests that fail ... probably backwards will have its share too .
        Hide
        Earwin Burrfoot added a comment -

        I'm at loss for words. No, seriously, basing your logic on the content of user-targeted error message?

        Btw, opening DirectoryReader on an empty directory fails with exception too. That's why some people might do this before opening - "new IW().close()", like I did.
        While trivial to fix, this has to be fixed, or an app breaks when started with an empty index directory.

        Show
        Earwin Burrfoot added a comment - I'm at loss for words. No, seriously, basing your logic on the content of user-targeted error message? Btw, opening DirectoryReader on an empty directory fails with exception too. That's why some people might do this before opening - "new IW().close()", like I did. While trivial to fix, this has to be fixed, or an app breaks when started with an empty index directory.
        Hide
        Michael McCandless added a comment -

        How about we subclass FNFE? Eg "NoSegmentsFileException" or something, thrown by SegmentInfos.read and caught in IR.listCommits?

        Show
        Michael McCandless added a comment - How about we subclass FNFE? Eg "NoSegmentsFileException" or something, thrown by SegmentInfos.read and caught in IR.listCommits?
        Hide
        Shai Erera added a comment -

        I already did that ... just didn't post back. Created SegmentsFileNotFoundException.

        Show
        Shai Erera added a comment - I already did that ... just didn't post back. Created SegmentsFileNotFoundException.
        Hide
        Shai Erera added a comment -

        Patch fixes all tests as well as changes to IndexWriter, IndexFileDeleter, DirectoryReader and SegmentInfos.

        I'd like to commit this shortly, before all the files get changed by a malicious other commit . (kidding of course)

        Show
        Shai Erera added a comment - Patch fixes all tests as well as changes to IndexWriter, IndexFileDeleter, DirectoryReader and SegmentInfos. I'd like to commit this shortly, before all the files get changed by a malicious other commit . (kidding of course)
        Hide
        Michael McCandless added a comment -

        Patch looks good!

        Hmm... maybe we should rename SegmentsFileNotFoundException to IndexNotFoundException? Ie make it more consumable to outside apps (who should know nothing about what a segments file is). And, move it to its own source file?

        Show
        Michael McCandless added a comment - Patch looks good! Hmm... maybe we should rename SegmentsFileNotFoundException to IndexNotFoundException? Ie make it more consumable to outside apps (who should know nothing about what a segments file is). And, move it to its own source file?
        Hide
        Shai Erera added a comment -

        Ok sounds good. Is there a preferred package for exceptions? Or is o.a.l.index ok?

        Show
        Shai Erera added a comment - Ok sounds good. Is there a preferred package for exceptions? Or is o.a.l.index ok?
        Hide
        Michael McCandless added a comment -

        I think oal.index is good.

        Show
        Michael McCandless added a comment - I think oal.index is good.
        Hide
        Shai Erera added a comment -

        Patch updated to latest rev. + the proposed name change – IndexNotFoundException. All tests pass. I plan to commit this later today.

        Show
        Shai Erera added a comment - Patch updated to latest rev. + the proposed name change – IndexNotFoundException. All tests pass. I plan to commit this later today.
        Hide
        Shai Erera added a comment -

        Committed revision 932868.

        Show
        Shai Erera added a comment - Committed revision 932868.
        Hide
        Shai Erera added a comment -

        As I indicated in an email, Solr tests failed (sorry for not running them before). After some investigation (thanks Robert !), that's the problem: before this change, IW always committed first on an empty directory. It called SegmentInfos.commit(dir), which by a chain of calls ensured the directory exists (in FSDir) by calling file.mkdirs().

        After this change, that chain of calls did not happen ... yet somehow tests we still passing for Lucene. Some investigation shows that the Solr tests that failed used SingleInstanceLockFactory, or NoLockFactory. By default, FSDir uses either SimpleFSLF or NativeFSLF. The IW.ctor calls LF.makeLock and obtain, which for these two LFs meant that calling file.mkdirs ... and thus the problem was hidden. SingleInstanceLF and NoLF don't do that !

        So first, a test which uses FSDir and one of these LFs need to be created, so we catch that problem in Lucene code (this is not related to Solr – just a missing test in Lucene). Second we need to fix IW ctor, or Dir or whatever.

        I've added that code to IW.ctor, as a sanity check to make sure it works - and indeed all Solr tests pass. So that's one option, even though a bit messy.

        try {
          directory.createOutput("temp").close();
        } finally {
          directory.deleteFile("temp");
        }
        

        Another option is to add to Directory a prepareForWrite() or simply prepare() which will be called by IW. A default empty impl on Directory, and file.mkdirs on FSDirectory should be enough.

        A third option is to define clear semantics for dir.listAll(), to throw a NoSuchDirectoryException and then change IndexFileDeleter to ignore that exception if OpenMode of IW is CREATE*. It kind of makes sense - if the directory is empty, why bother looking for any index files. Lucene code today already expects that exception to be thrown in SegmentInfos.getCurrentSegmentGeneration – so we kind of say 'either you use RAMDirectory, or a sub-class of FSDirectory, and then that's what we expect'. So it's not so much of a backwards change ...

        While dir.prepare() or prepareForWrite() is very explicit ... it's not protective enough - one can still call listAll w/o calling prepareForWrite (why would you call it if you just want to list files) so I'm not sure which is the best option ... Maybe the last option is the best as at least the caller should not assume anything about the state of the directory. Just prepare to handle the NoSuchDirectoryException, vs. 'there is a directory but it's empty' case.

        I'll revert my commit until this is resolved.

        Show
        Shai Erera added a comment - As I indicated in an email, Solr tests failed (sorry for not running them before). After some investigation (thanks Robert !), that's the problem: before this change, IW always committed first on an empty directory. It called SegmentInfos.commit(dir), which by a chain of calls ensured the directory exists (in FSDir) by calling file.mkdirs(). After this change, that chain of calls did not happen ... yet somehow tests we still passing for Lucene. Some investigation shows that the Solr tests that failed used SingleInstanceLockFactory, or NoLockFactory. By default, FSDir uses either SimpleFSLF or NativeFSLF. The IW.ctor calls LF.makeLock and obtain, which for these two LFs meant that calling file.mkdirs ... and thus the problem was hidden. SingleInstanceLF and NoLF don't do that ! So first, a test which uses FSDir and one of these LFs need to be created, so we catch that problem in Lucene code (this is not related to Solr – just a missing test in Lucene). Second we need to fix IW ctor, or Dir or whatever. I've added that code to IW.ctor, as a sanity check to make sure it works - and indeed all Solr tests pass. So that's one option, even though a bit messy. try { directory.createOutput( "temp" ).close(); } finally { directory.deleteFile( "temp" ); } Another option is to add to Directory a prepareForWrite() or simply prepare() which will be called by IW. A default empty impl on Directory, and file.mkdirs on FSDirectory should be enough. A third option is to define clear semantics for dir.listAll(), to throw a NoSuchDirectoryException and then change IndexFileDeleter to ignore that exception if OpenMode of IW is CREATE*. It kind of makes sense - if the directory is empty, why bother looking for any index files. Lucene code today already expects that exception to be thrown in SegmentInfos.getCurrentSegmentGeneration – so we kind of say 'either you use RAMDirectory, or a sub-class of FSDirectory, and then that's what we expect'. So it's not so much of a backwards change ... While dir.prepare() or prepareForWrite() is very explicit ... it's not protective enough - one can still call listAll w/o calling prepareForWrite (why would you call it if you just want to list files) so I'm not sure which is the best option ... Maybe the last option is the best as at least the caller should not assume anything about the state of the directory. Just prepare to handle the NoSuchDirectoryException, vs. 'there is a directory but it's empty' case. I'll revert my commit until this is resolved.
        Hide
        Shai Erera added a comment -

        Committed revision 932917 for the revert.

        Show
        Shai Erera added a comment - Committed revision 932917 for the revert.
        Hide
        Mark Miller added a comment -

        Is this change worth it with all of its repercussions? What are the upsides? There do appear to be downsides...

        Show
        Mark Miller added a comment - Is this change worth it with all of its repercussions? What are the upsides? There do appear to be downsides...
        Hide
        Shai Erera added a comment -

        Fixes IndexFileDeleter, adds a proper test to TestIndexWriter. Haven't run all the tests yet though, but the added test passes now with the fix.

        Show
        Shai Erera added a comment - Fixes IndexFileDeleter, adds a proper test to TestIndexWriter. Haven't run all the tests yet though, but the added test passes now with the fix.
        Hide
        Michael McCandless added a comment -

        I like the fix (catching the NoSuchDirectoryException in IFD), but, can you shrink wrap that try/catch? Ie just have the try/catch around the Dir.listAll call.

        I do think this is a good change – IW was previously inconsistent, first that it would even make a commit when we no longer have an "autoCommit=true", and, second, that it would not make the commit for a directory that already had an index (we fixed this case a while back). So I like that this fix makes IW's init behavior more consistent / simpler.

        Show
        Michael McCandless added a comment - I like the fix (catching the NoSuchDirectoryException in IFD), but, can you shrink wrap that try/catch? Ie just have the try/catch around the Dir.listAll call. I do think this is a good change – IW was previously inconsistent, first that it would even make a commit when we no longer have an "autoCommit=true", and, second, that it would not make the commit for a directory that already had an index (we fixed this case a while back). So I like that this fix makes IW's init behavior more consistent / simpler.
        Hide
        Uwe Schindler added a comment -

        I dont understand the whole issue, too.

        For me it is perfectly fine, if I open an IndexWriter with create=true, that the index is created empty first. This has the big advantage, that IndexReaders can open it and will not fail with not found. OK this can be done by a commit directly after creating, but for such code like "create indexwriter with create=true if not exist else append", this is more work to do.

        The question is also, what happens if you call IndexWriter.getReader() without the initial commit? Does this work with your patch?

        For me this patch is to heavy for the small improvement, and its a behaviour change and no real bug.

        Show
        Uwe Schindler added a comment - I dont understand the whole issue, too. For me it is perfectly fine, if I open an IndexWriter with create=true, that the index is created empty first. This has the big advantage, that IndexReaders can open it and will not fail with not found. OK this can be done by a commit directly after creating, but for such code like "create indexwriter with create=true if not exist else append", this is more work to do. The question is also, what happens if you call IndexWriter.getReader() without the initial commit? Does this work with your patch? For me this patch is to heavy for the small improvement, and its a behaviour change and no real bug.
        Hide
        Mark Miller added a comment -

        I do think this is a good change - IW was previously inconsistent, first that it would even make a commit when we no longer have an "autoCommit=true", and, second, that it would not make the commit for a directory that already had an index (we fixed this case a while back). So I like that this fix makes IW's init behavior more consistent / simpler.

        Thats not a very strong argument for a back compat break on a minor release though...

        Show
        Mark Miller added a comment - I do think this is a good change - IW was previously inconsistent, first that it would even make a commit when we no longer have an "autoCommit=true", and, second, that it would not make the commit for a directory that already had an index (we fixed this case a while back). So I like that this fix makes IW's init behavior more consistent / simpler. Thats not a very strong argument for a back compat break on a minor release though...
        Hide
        Michael McCandless added a comment -

        Actually I consider this a bug in IW's ACID transactional semantics...

        The "I" in ACID is "isolation", meaning, a reader shouldn't ever see
        uncommitted changes from a writer.

        This bug breaks "isolation", ie, if you open with CREATE, today, IW
        may or may not slip in a commit on you, depending (rather
        unexpectedly, and likely the opposite of what you'd guess) on whether
        a prior index is already there.

        So IW sometimes will break ACID transactions, and sometimes not, if
        you use CREATE.

        Further, IW really shouldn't ever write a commit "automatically" – it
        used to do this (with autoCommit=true), but we stopped doing so
        (except for this bug), now that autoCommit is false.

        With this fix, IW will never sneak in a commit. Your app fully
        controls when the transaction (including CREATE, which in Lucene,
        unlike most RDMBSs, is part of the transaction) becomes visible.

        Thats not a very strong argument for a back compat break on a minor release though...

        Hmmm... I think the back compat break is very minor. Also, IW's
        now-gone autoCommit never "promised" when commits would be made, so
        this was really undocumented behavior.

        Solr's test failures were not due to this break; they we actually due
        to a sneaky bug that otherwise (had we not had Solr's tests) would
        likely have remained undiscovered for quite some time.

        And, much of the "noise" in the patch is from tests relying on exact
        file names, commit counts, etc. Plus some of the usual Shai-cleanups

        I guess if we really care to, we can emulate the "not quite ACID" bug
        when Version <= 3.1?

        The question is also, what happens if you call IndexWriter.getReader() without the initial commit? Does this work with your patch?

        This should be perfectly fine – you'll get a reader searching 0 docs.

        Shai can you add a test case confirming this?

        Shai a couple other things:

        • Please shrink-wrap the try/except in IFD
        • In Directory.listCommits, instead of catching
          IndexNotFoundException and returning empty list, I think we should
          throw it? You get this exception today, right?
        Show
        Michael McCandless added a comment - Actually I consider this a bug in IW's ACID transactional semantics... The "I" in ACID is "isolation", meaning, a reader shouldn't ever see uncommitted changes from a writer. This bug breaks "isolation", ie, if you open with CREATE, today, IW may or may not slip in a commit on you, depending (rather unexpectedly, and likely the opposite of what you'd guess) on whether a prior index is already there. So IW sometimes will break ACID transactions, and sometimes not, if you use CREATE. Further, IW really shouldn't ever write a commit "automatically" – it used to do this (with autoCommit=true), but we stopped doing so (except for this bug), now that autoCommit is false. With this fix, IW will never sneak in a commit. Your app fully controls when the transaction (including CREATE, which in Lucene, unlike most RDMBSs, is part of the transaction) becomes visible. Thats not a very strong argument for a back compat break on a minor release though... Hmmm... I think the back compat break is very minor. Also, IW's now-gone autoCommit never "promised" when commits would be made, so this was really undocumented behavior. Solr's test failures were not due to this break; they we actually due to a sneaky bug that otherwise (had we not had Solr's tests) would likely have remained undiscovered for quite some time. And, much of the "noise" in the patch is from tests relying on exact file names, commit counts, etc. Plus some of the usual Shai-cleanups I guess if we really care to, we can emulate the "not quite ACID" bug when Version <= 3.1? The question is also, what happens if you call IndexWriter.getReader() without the initial commit? Does this work with your patch? This should be perfectly fine – you'll get a reader searching 0 docs. Shai can you add a test case confirming this? Shai a couple other things: Please shrink-wrap the try/except in IFD In Directory.listCommits, instead of catching IndexNotFoundException and returning empty list, I think we should throw it? You get this exception today, right?
        Hide
        Mark Miller added a comment -

        Hmmm... I think the back compat break is very minor

        Yes - it is - but so was the argument for it IMO.

        Your extended argument is more compelling though.

        Show
        Mark Miller added a comment - Hmmm... I think the back compat break is very minor Yes - it is - but so was the argument for it IMO. Your extended argument is more compelling though.
        Hide
        Michael McCandless added a comment -

        Shai I think you should also check Solr's spellchecker index to make sure it won't be affected by this not-ACID bug? (And add an IW.commit to fix it, if so...).

        Show
        Michael McCandless added a comment - Shai I think you should also check Solr's spellchecker index to make sure it won't be affected by this not-ACID bug? (And add an IW.commit to fix it, if so...).
        Hide
        Yonik Seeley added a comment -

        I think Solr's spellchecker should be OK.
        It builds the index in one go, so there shouldn't be open IndexWriters hanging around.

        If the spellchecker does manage to break, I imagine it would be a bug in the spellchecker (concurrency related... like what happens when one thread is building the spellcheck index and another thread comes along and tries to do a spellcheck request, etc)

        Show
        Yonik Seeley added a comment - I think Solr's spellchecker should be OK. It builds the index in one go, so there shouldn't be open IndexWriters hanging around. If the spellchecker does manage to break, I imagine it would be a bug in the spellchecker (concurrency related... like what happens when one thread is building the spellcheck index and another thread comes along and tries to do a spellcheck request, etc)
        Hide
        Shai Erera added a comment -

        About IndexReader.listCommits ... the javadocs state this "There must be at least one commit in the Directory, else this method throws java.io.IOException.". So I'll change it to reflect the right exception type is thrown (IndexNotFoundException) and revert the change to DirReader.listCommits which returns an empty list.

        Show
        Shai Erera added a comment - About IndexReader.listCommits ... the javadocs state this "There must be at least one commit in the Directory, else this method throws java.io.IOException.". So I'll change it to reflect the right exception type is thrown (IndexNotFoundException) and revert the change to DirReader.listCommits which returns an empty list.
        Hide
        Shai Erera added a comment -

        Patch w/ proposed fixes. All tests pass, including Solr's .

        Show
        Shai Erera added a comment - Patch w/ proposed fixes. All tests pass, including Solr's .
        Hide
        Michael McCandless added a comment -

        Patch looks good... thanks Shai.

        I think the only open question is whether we should emulate the "not quite ACID" bug if Version < LUCENE_31?

        I don't think it's necessary (but we should advertise this in back-compat breaks in CHANGES).... anyone else?

        Show
        Michael McCandless added a comment - Patch looks good... thanks Shai. I think the only open question is whether we should emulate the "not quite ACID" bug if Version < LUCENE_31? I don't think it's necessary (but we should advertise this in back-compat breaks in CHANGES).... anyone else?
        Hide
        Earwin Burrfoot added a comment -

        I don't see why this should be "fixed".
        If IndexWriter slips in a commit on an empty directory for you, that's not a commit per ce, it's an initialization of directory to be recognized as a valid index (exactly what you are asking with Mode.CREATE parameter).

        What is now a correct way of opening IndexReader on a directory that may, or may not be there?
        With current behaviour I create an IW in CREATE_OR_APPEND mode, and it guarantees that consequent IR.open() will get something it recognizes as the index.

        Show
        Earwin Burrfoot added a comment - I don't see why this should be "fixed". If IndexWriter slips in a commit on an empty directory for you, that's not a commit per ce, it's an initialization of directory to be recognized as a valid index (exactly what you are asking with Mode.CREATE parameter). What is now a correct way of opening IndexReader on a directory that may, or may not be there? With current behaviour I create an IW in CREATE_OR_APPEND mode, and it guarantees that consequent IR.open() will get something it recognizes as the index.
        Hide
        Uwe Schindler added a comment -

        Thanks Earwin, thats exactly my opinion, too. For me the whole behaviour is defined and correct. The create param in the ctor is just an initialization of the directory to be a defined index (empty at the beginning).

        Maybe we should remove the create param from IndexWriter ctor/config at all, and just define a static utility method in IW, that "initializes" an empty directory. The standard ctors in IW then should thow IndexNotFound if the directory is not yet initialized. This way, we dont need those strange create params.

        Show
        Uwe Schindler added a comment - Thanks Earwin, thats exactly my opinion, too. For me the whole behaviour is defined and correct. The create param in the ctor is just an initialization of the directory to be a defined index (empty at the beginning). Maybe we should remove the create param from IndexWriter ctor/config at all, and just define a static utility method in IW, that "initializes" an empty directory. The standard ctors in IW then should thow IndexNotFound if the directory is not yet initialized. This way, we dont need those strange create params.
        Hide
        Earwin Burrfoot added a comment -

        The mode in ctor is perfectly fine. Remove it, and everyone's gonna end up writing the very code you remove, in seven different ways, five of them broken somehow.

        Show
        Earwin Burrfoot added a comment - The mode in ctor is perfectly fine. Remove it, and everyone's gonna end up writing the very code you remove, in seven different ways, five of them broken somehow.
        Hide
        Shai Erera added a comment -

        I'm not sure if we're arguing about the same thing here ... why when I open an IW on empty Directory I need an empty segment that's created, and from now on never changed, populated or even read? That just seems wrong to me ... when I fixed the tests to not rely on the buggy behavior, I noticed several which count the list of commits (especially the IDP ones) w/ a documentation like "1 for opening + N for committing" ...

        It just looks weird that when you open IW a commit happens, a set of empty files are created, but from now on they are never modified, until IDP kicks in, after the second commit ... it's nothing like initing the Directory to be able to receive input ..

        And I don't know what's the benefit of doing "new IW()" following by "IR.open()" ... that IR will always see 0 documents, until you call reopen (if commit happened in between). So what's the convenience here? that your code can call IR.open once, and from that point forward just 'reopen()'? That seems low advantage to me, really. Maybe what we should do is fix IR.open to return a null IR in case the directory hasn't been populated w/ anything yet. Then you can check easily if you should call open() (==null) or reopen (otherwise). Or create a blank stub of IR which emulates an empty Dir, and when reopen is called works well (if the Directory is not empty now) ...

        BTW, FWIW, Solr's code did not break from this change at all ... it was the combination of FSDir and NoLF/SingleInstanceLF that broke some tests that used it ... I don't know how many apps out there are using that combination, but I'd bet it's small? I use that combination, however in my case an IR is opened only after a commit signal/event is raised (so I don't check isCurrent often or attempt to reopen()). What I'm trying to say is that this combination is dangerous, and the application needs to ensure that only one IW is open at any given time, and I'm sure such apps are more sophisticated then opening IW and then IR just for the convenience of it.

        Show
        Shai Erera added a comment - I'm not sure if we're arguing about the same thing here ... why when I open an IW on empty Directory I need an empty segment that's created, and from now on never changed, populated or even read? That just seems wrong to me ... when I fixed the tests to not rely on the buggy behavior, I noticed several which count the list of commits (especially the IDP ones) w/ a documentation like "1 for opening + N for committing" ... It just looks weird that when you open IW a commit happens, a set of empty files are created, but from now on they are never modified, until IDP kicks in, after the second commit ... it's nothing like initing the Directory to be able to receive input .. And I don't know what's the benefit of doing "new IW()" following by "IR.open()" ... that IR will always see 0 documents, until you call reopen (if commit happened in between). So what's the convenience here? that your code can call IR.open once, and from that point forward just 'reopen()'? That seems low advantage to me, really. Maybe what we should do is fix IR.open to return a null IR in case the directory hasn't been populated w/ anything yet. Then you can check easily if you should call open() (==null) or reopen (otherwise). Or create a blank stub of IR which emulates an empty Dir, and when reopen is called works well (if the Directory is not empty now) ... BTW, FWIW, Solr's code did not break from this change at all ... it was the combination of FSDir and NoLF/SingleInstanceLF that broke some tests that used it ... I don't know how many apps out there are using that combination, but I'd bet it's small? I use that combination, however in my case an IR is opened only after a commit signal/event is raised (so I don't check isCurrent often or attempt to reopen()). What I'm trying to say is that this combination is dangerous, and the application needs to ensure that only one IW is open at any given time, and I'm sure such apps are more sophisticated then opening IW and then IR just for the convenience of it.
        Hide
        Earwin Burrfoot added a comment -

        Meh, that all is just a matter of perspective.

        It doesn't look weird for me that an empty commit happens. Go, create some SVN repository - it has initial #0 commit inside. I bet all version control systems and all databases have the concept of null initial commit.
        This empty commit is what discerns empty existing directory from empty lucene index. If you create some docs, index them, then delete and commit, you're going to get just the same picture - a directory with a single segments* file (only generation number will differ).

        And references to "but from now on they are never modified" are just weird - it's a Lucene index dammit, no files are ever modified here.

        But, what looks really weird to me is suggestions like

        Then you can check easily if you should call open() (==null) or reopen (otherwise). Or create a blank stub of IR which emulates an empty Dir, and when reopen is called works well (if the Directory is not empty now) ...

        adding complexity and workarounds to plug a hole that didn't exist before the "fix" that does essentially nothing of value.

        My question is still unanswered - what is the proper way (after this fix) to open an IR over possibly-empty directory? My app, for example, opens an IR and does scheduled reopens. The time interval is small, a reopen over unmodified index is a noop, so this is all beautifully simple and just as effective as waiting for a commit event (like I did in the past, just like you). It opens an IndexWriter with CREATE_OR_APPEND mode before opening the first reader, and is thus guaranteed to have a good index directory regardless of the situation.
        Immediate fix I see is to check whether the directory is empty and do a first empty commit myself - UGLY.

        Show
        Earwin Burrfoot added a comment - Meh, that all is just a matter of perspective. It doesn't look weird for me that an empty commit happens. Go, create some SVN repository - it has initial #0 commit inside. I bet all version control systems and all databases have the concept of null initial commit. This empty commit is what discerns empty existing directory from empty lucene index. If you create some docs, index them, then delete and commit, you're going to get just the same picture - a directory with a single segments* file (only generation number will differ). And references to "but from now on they are never modified" are just weird - it's a Lucene index dammit, no files are ever modified here. But, what looks really weird to me is suggestions like Then you can check easily if you should call open() (==null) or reopen (otherwise). Or create a blank stub of IR which emulates an empty Dir, and when reopen is called works well (if the Directory is not empty now) ... adding complexity and workarounds to plug a hole that didn't exist before the "fix" that does essentially nothing of value. My question is still unanswered - what is the proper way (after this fix) to open an IR over possibly-empty directory? My app, for example, opens an IR and does scheduled reopens. The time interval is small, a reopen over unmodified index is a noop, so this is all beautifully simple and just as effective as waiting for a commit event (like I did in the past, just like you). It opens an IndexWriter with CREATE_OR_APPEND mode before opening the first reader, and is thus guaranteed to have a good index directory regardless of the situation. Immediate fix I see is to check whether the directory is empty and do a first empty commit myself - UGLY.
        Hide
        Earwin Burrfoot added a comment -

        I'm sure such apps are more sophisticated then opening IW and then IR just for the convenience of it.

        Sophistication is a sin

        Show
        Earwin Burrfoot added a comment - I'm sure such apps are more sophisticated then opening IW and then IR just for the convenience of it. Sophistication is a sin
        Hide
        Shai Erera added a comment -

        what is the proper way (after this fix) to open an IR over possibly-empty directory?

        You can simply call commit() immediately after you open IW. If that's what you need then it will work for you.

        You're right that if I add docs, deletes and them commits, I'll get an empty segment. So is if you do "new IW()" and then "iw.close()" w/ no addDocument in between. The point here was that we should not create a commit unless the user has specifically asked for it. Calling close() means asking for a commit, per close semantics and contract. But if the app called new IW, add docs and crashed in the middle, the Directory will still remain empty ... which is sort of what, IMO, should happen.

        I agree it's a matter of perspective. I think that when autoCommit was removed, so should have been this code. I don't know if it was left behind for a good reason, or simply because when someone tried to do it, he found out it's not that simple (like I have ).

        Show
        Shai Erera added a comment - what is the proper way (after this fix) to open an IR over possibly-empty directory? You can simply call commit() immediately after you open IW. If that's what you need then it will work for you. You're right that if I add docs, deletes and them commits, I'll get an empty segment. So is if you do "new IW()" and then "iw.close()" w/ no addDocument in between. The point here was that we should not create a commit unless the user has specifically asked for it. Calling close() means asking for a commit, per close semantics and contract. But if the app called new IW, add docs and crashed in the middle, the Directory will still remain empty ... which is sort of what, IMO, should happen. I agree it's a matter of perspective. I think that when autoCommit was removed, so should have been this code. I don't know if it was left behind for a good reason, or simply because when someone tried to do it, he found out it's not that simple (like I have ).
        Hide
        Earwin Burrfoot added a comment -

        The point here was that we should not create a commit unless the user has specifically asked for it.

        Isn't opening IW with CREATE* mode called "specifically asking for"?
        If he doesn't need this functionality, he uses another mode, that simple.

        But if the app called new IW, add docs and crashed in the middle, the Directory will still remain empty ... which is sort of what, IMO, should happen.

        Why so? Please, read my reference to version control systems and databases - they all leave behind an initialized database if the first commit fails.

        My point is that the feature is immediately usable and saves people from writing (possibly buggy) emptiness-detection-then-commit code over and over again.

        Show
        Earwin Burrfoot added a comment - The point here was that we should not create a commit unless the user has specifically asked for it. Isn't opening IW with CREATE* mode called "specifically asking for"? If he doesn't need this functionality, he uses another mode, that simple. But if the app called new IW, add docs and crashed in the middle, the Directory will still remain empty ... which is sort of what, IMO, should happen. Why so? Please, read my reference to version control systems and databases - they all leave behind an initialized database if the first commit fails. My point is that the feature is immediately usable and saves people from writing (possibly buggy) emptiness-detection-then-commit code over and over again.
        Hide
        Shai Erera added a comment -

        I don't think that people need to write that "emptiness-detection-then-commit" code ... if they care, they can simply immediately call commit() after they open IW.

        Isn't opening IW with CREATE* mode called "specifically asking for"?

        It depends on how you interpret the mode ... for example, you cannot pass OpenMode.APPEND for an empty Directory, because IW throws an exception. The modes are just meant to tell IW how to behave:

        • APPEND - I know there is an index in the Directory, and I'd like to append to it.
        • CREATE - I don't care if there is an index in the Directory – create a new one, zeroing out all segments.
        • CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one.

        So if you pass CREATE on an already populated index, IW doesn't do the implicit commit, until you call commit() yourself. But if you pass CREATE on an empty index, IW suddenly calls commit()? That's just an inconsistency that's meant to allow you to open an IR immediately after "new IW()" call, irregardless of what was there? And if you open that IR, then if the index was populated you see the previous set of documents, but if it wasn't you see nothing, even though you meant to say "override what's there"?

        I've checked what FileOutputStream does, using the following code:

        File file = new File("d:/temp/tmpfile");
        FileOutputStream fos = new FileOutputStream(file);
        fos.write(3);
        fos.close();
        	  
        fos = new FileOutputStream(file);
        FileInputStream fis = new FileInputStream(file);
        System.out.println(fis.read());
        
        • Second line creates an empty file immediately, not waiting for close() or flush() – which resembles the behavior that you're suggesting we should take w/ IW (which is the 'today's behavior')
        • Forth line closes the file, flushing and writing the content.
        • Fifth line recreates the file, empty, again, w/o calling close. So it zeros out the file content immediately, even before you wrote a single piece of byte to it.
        • Sixth+Seventh line proves it by attempting to read from the file, and the output printed is -1.

        I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm trying to show is that we don't fully adhere to the CREATE mode, and rightfully if you ask me - we shouldn't zero out the segments until the application called commit(). But we choose to adhere differently to the CREATE* mode if the index is already populated. That's an inconsistent behavior, at least in my perspective. It's also harder to explain and document, e.g. "you should call commit() if you used CREATE, in case you want to zero out everything immediately, and the Directory is not empty, but you don't need to call commit() if the directory was empty, Lucene will do it for you." – so now how will the app know if it should call commit()? It will need to write a sort of "emptiness-detection-then-commit"?

        I am willing to consider the following semantics:

        • APPEND - assumes an index exists and open it.
        • CREATE - zeros out everything that's in the directory immediately, and also prepares an empty directory.
        • CREATE_OR_APPEND - either loads an existing index, or is able to work on the empty directory. No implicit commit is happening by IW if the index does not exist.

        But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed change to the patch so far – if you open an index in CREATE*, you should call commit before you can read it. That will adhere to the semantics of what the application wanted, whether it meant to zero out an existing Directory, or create a new one from scratch.

        Show
        Shai Erera added a comment - I don't think that people need to write that "emptiness-detection-then-commit" code ... if they care, they can simply immediately call commit() after they open IW. Isn't opening IW with CREATE* mode called "specifically asking for"? It depends on how you interpret the mode ... for example, you cannot pass OpenMode.APPEND for an empty Directory, because IW throws an exception. The modes are just meant to tell IW how to behave: APPEND - I know there is an index in the Directory, and I'd like to append to it. CREATE - I don't care if there is an index in the Directory – create a new one, zeroing out all segments. CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one. So if you pass CREATE on an already populated index, IW doesn't do the implicit commit, until you call commit() yourself. But if you pass CREATE on an empty index, IW suddenly calls commit()? That's just an inconsistency that's meant to allow you to open an IR immediately after "new IW()" call, irregardless of what was there? And if you open that IR, then if the index was populated you see the previous set of documents, but if it wasn't you see nothing, even though you meant to say "override what's there"? I've checked what FileOutputStream does, using the following code: File file = new File( "d:/temp/tmpfile" ); FileOutputStream fos = new FileOutputStream(file); fos.write(3); fos.close(); fos = new FileOutputStream(file); FileInputStream fis = new FileInputStream(file); System .out.println(fis.read()); Second line creates an empty file immediately, not waiting for close() or flush() – which resembles the behavior that you're suggesting we should take w/ IW (which is the 'today's behavior') Forth line closes the file, flushing and writing the content. Fifth line recreates the file, empty, again, w/o calling close. So it zeros out the file content immediately, even before you wrote a single piece of byte to it. Sixth+Seventh line proves it by attempting to read from the file, and the output printed is -1. I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm trying to show is that we don't fully adhere to the CREATE mode, and rightfully if you ask me - we shouldn't zero out the segments until the application called commit(). But we choose to adhere differently to the CREATE* mode if the index is already populated. That's an inconsistent behavior, at least in my perspective. It's also harder to explain and document, e.g. "you should call commit() if you used CREATE, in case you want to zero out everything immediately, and the Directory is not empty, but you don't need to call commit() if the directory was empty, Lucene will do it for you." – so now how will the app know if it should call commit()? It will need to write a sort of "emptiness-detection-then-commit"? I am willing to consider the following semantics: APPEND - assumes an index exists and open it. CREATE - zeros out everything that's in the directory immediately , and also prepares an empty directory. CREATE_OR_APPEND - either loads an existing index, or is able to work on the empty directory. No implicit commit is happening by IW if the index does not exist. But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed change to the patch so far – if you open an index in CREATE*, you should call commit before you can read it. That will adhere to the semantics of what the application wanted, whether it meant to zero out an existing Directory, or create a new one from scratch.
        Hide
        Earwin Burrfoot added a comment -

        So if you pass CREATE on an already populated index, IW doesn't do the implicit commit, until you call commit() yourself. But if you pass CREATE on an empty index, IW suddenly calls commit()?

        Empty directory is not an empty index. It is just an empty directory. That first commit initializes empty directory to an empty index. Seems this is the core of our misagreement.

        Show
        Earwin Burrfoot added a comment - So if you pass CREATE on an already populated index, IW doesn't do the implicit commit, until you call commit() yourself. But if you pass CREATE on an empty index, IW suddenly calls commit()? Empty directory is not an empty index. It is just an empty directory. That first commit initializes empty directory to an empty index. Seems this is the core of our misagreement.
        Hide
        Earwin Burrfoot added a comment -

        I'm surrendering the issue, any outcome doesn't warrant the time spent discussing it.
        My conclusions are:

        • I see no profit in this change. That may be different for others.
        • I see some harm done, but it is limited. new IW().close() approach used by many people continues to work. People who open both IR and IW upfront and keep them during app lifetime (like me) have to add a single line to their code. Being an avid enemy of backwards compatibility, I don't care
        • The core of misagreement behind the issue rests on whether to treat "create index" as a transaction or not. It can go both ways.
        Show
        Earwin Burrfoot added a comment - I'm surrendering the issue, any outcome doesn't warrant the time spent discussing it. My conclusions are: I see no profit in this change. That may be different for others. I see some harm done, but it is limited. new IW().close() approach used by many people continues to work. People who open both IR and IW upfront and keep them during app lifetime (like me) have to add a single line to their code. Being an avid enemy of backwards compatibility, I don't care The core of misagreement behind the issue rests on whether to treat "create index" as a transaction or not. It can go both ways.
        Hide
        Michael McCandless added a comment -

        Shai, can you also test CREATE on an empty index and then
        IW.rollback()? IW should remove all files it created, never having
        made anything visible to an outside reader.

        Show
        Michael McCandless added a comment - Shai, can you also test CREATE on an empty index and then IW.rollback()? IW should remove all files it created, never having made anything visible to an outside reader.
        Hide
        Shai Erera added a comment -

        So just call "new IW()", then rollback and ensure dir.listAll() returns an empty list? Or also index stuff, making sure a flush occurs and then rollback? I'm not sure that the latter is related to that issue ...

        Show
        Shai Erera added a comment - So just call "new IW()", then rollback and ensure dir.listAll() returns an empty list? Or also index stuff, making sure a flush occurs and then rollback? I'm not sure that the latter is related to that issue ...
        Hide
        Michael McCandless added a comment -

        Yeah I think new IW(), set maxBufferedDocs to 2, index 2 docs so a flush happens, then rollback, then confirm dir is empty?

        Show
        Michael McCandless added a comment - Yeah I think new IW(), set maxBufferedDocs to 2, index 2 docs so a flush happens, then rollback, then confirm dir is empty?
        Hide
        Shai Erera added a comment -

        Patch includes the proposed test in TestIndexWriter. I think this is ready for commit, if there are no more objections.

        Show
        Shai Erera added a comment - Patch includes the proposed test in TestIndexWriter. I think this is ready for commit, if there are no more objections.
        Hide
        Michael McCandless added a comment -

        Patch looks good Shai!

        Show
        Michael McCandless added a comment - Patch looks good Shai!
        Hide
        Shai Erera added a comment -

        Committed revision 933613. (take #2)

        Show
        Shai Erera added a comment - Committed revision 933613. (take #2)
        Hide
        Shai Erera added a comment -

        Backport to 3.1

        Show
        Shai Erera added a comment - Backport to 3.1
        Hide
        Shai Erera added a comment -

        Committed revision 941607.

        Show
        Shai Erera added a comment - Committed revision 941607.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development