Lucene - Core
  1. Lucene - Core
  2. LUCENE-5377

IW.addIndexes(Dir[]) causes silent index corruption

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.6
    • Fix Version/s: 4.6.1, 4.7, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Environment:

      windows/linux

    • Lucene Fields:
      New

      Description

      my old facet index create by Lucene version=4.2
      use indexChecker ok.

      now I upgrade to Lucene 4.6 and put some new records to index.
      then reopen index, some files in indexdir missing....
      no .si files.

      I debug into it, new version format of segments.gen(segments_N) record bad segments info.

      1. LUCENE-5377.patch
        23 kB
        Robert Muir
      2. LUCENE-5377.patch
        14 kB
        Robert Muir
      3. TestCore_45.java
        2 kB
        Littlestar
      4. TestCore_46.java
        2 kB
        Littlestar

        Issue Links

          Activity

          Hide
          Littlestar added a comment - - edited

          Lucene 4.5/4.5.1 is ok.
          but failed in 4.6.0

          Show
          Littlestar added a comment - - edited Lucene 4.5/4.5.1 is ok. but failed in 4.6.0
          Hide
          Littlestar added a comment - - edited

          reproduce steps:
          1. run TestCore_45 in luence 4.5.1, it will generate a index TestCore_45/data/taxo.
          2. copy TestCore_45/data/taxo into TestCore_46/data/taxo.
          3. run TestCore_46 in luence 4.6.0.
          you will see the following errors.
          Exception in thread "main" java.io.FileNotFoundException: xception in thread "main" java.io.FileNotFoundException: G:\JavaWorkspace\bug_test\Lucene_46\data\taxo_2.si (.........)
          at java.io.RandomAccessFile.open(Native Method)
          at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)
          at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:388)
          at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:103)
          at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:58)
          at org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Lucene40SegmentInfoReader.java:51)
          at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340)
          at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:404)
          at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
          at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:694)
          at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:400)
          at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.readCommitData(DirectoryTaxonomyWriter.java:138)
          at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.<init>(DirectoryTaxonomyWriter.java:222)
          at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.<init>(DirectoryTaxonomyWriter.java:331)
          at TestCore_46$1.<init>(TestCore_46.java:27)
          at TestCore_46.open(TestCore_46.java:27)
          at TestCore_46.main(TestCore_46.java:75)

          ==============================

          Show
          Littlestar added a comment - - edited reproduce steps: 1. run TestCore_45 in luence 4.5.1, it will generate a index TestCore_45/data/taxo. 2. copy TestCore_45/data/taxo into TestCore_46/data/taxo. 3. run TestCore_46 in luence 4.6.0. you will see the following errors. Exception in thread "main" java.io.FileNotFoundException: xception in thread "main" java.io.FileNotFoundException: G:\JavaWorkspace\bug_test\Lucene_46\data\taxo_2.si (.........) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233) at org.apache.lucene.store.FSDirectory$FSIndexInput.<init>(FSDirectory.java:388) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.<init>(SimpleFSDirectory.java:103) at org.apache.lucene.store.SimpleFSDirectory.openInput(SimpleFSDirectory.java:58) at org.apache.lucene.codecs.lucene40.Lucene40SegmentInfoReader.read(Lucene40SegmentInfoReader.java:51) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:340) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:404) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:694) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:400) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.readCommitData(DirectoryTaxonomyWriter.java:138) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.<init>(DirectoryTaxonomyWriter.java:222) at org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter.<init>(DirectoryTaxonomyWriter.java:331) at TestCore_46$1.<init>(TestCore_46.java:27) at TestCore_46.open(TestCore_46.java:27) at TestCore_46.main(TestCore_46.java:75) ==============================
          Hide
          Littlestar added a comment -

          I debug into it, Lucene 4.6's new format of segments.gen(segments_N) record bad segments info.

          Show
          Littlestar added a comment - I debug into it, Lucene 4.6's new format of segments.gen(segments_N) record bad segments info.
          Hide
          Michael McCandless added a comment -

          Indeed, something is wrong here.

          If I fix DirectoryTaxonomyWriter to use IW.addIndexes(IR[]) instead of IW.addIndexes(Directory[]) then the test "passes". The bug is due to this code in IndexWriter.addIndces(Dir[]):

              TrackingDirectoryWrapper trackingDir = new TrackingDirectoryWrapper(directory);
              try {
                newInfo.getCodec().segmentInfoFormat().getSegmentInfoWriter().write(trackingDir, newInfo, fis, context);
              } catch (UnsupportedOperationException uoe) {
                // OK: 3x codec cannot write a new SI file;
                // SegmentInfos will write this on commit
              }
          

          The problem is ... this logic is wrong: the segment was written with Lucene40SegmentInfoFormat, which works in 4.5.x but in 4.6.x we upgraded this to Lucene46SIFormat...

          Show
          Michael McCandless added a comment - Indeed, something is wrong here. If I fix DirectoryTaxonomyWriter to use IW.addIndexes(IR[]) instead of IW.addIndexes(Directory[]) then the test "passes". The bug is due to this code in IndexWriter.addIndces(Dir[]): TrackingDirectoryWrapper trackingDir = new TrackingDirectoryWrapper(directory); try { newInfo.getCodec().segmentInfoFormat().getSegmentInfoWriter().write(trackingDir, newInfo, fis, context); } catch (UnsupportedOperationException uoe) { // OK: 3x codec cannot write a new SI file; // SegmentInfos will write this on commit } The problem is ... this logic is wrong: the segment was written with Lucene40SegmentInfoFormat, which works in 4.5.x but in 4.6.x we upgraded this to Lucene46SIFormat...
          Hide
          Shai Erera added a comment -

          Isn't it then a problem of anyone that will try to addIndexes a pre-4.6 index to a 4.6+?

          I looked at TestBackCompat.addOldIndexes - it doesn't do anything with the added index to verify it's OK. I added DirectoryReader.open(targetDir).close(); to the test after w.close(), so at least we verify the index can be opened, but the test doesn't fail, not sure why. The newest "oldNames" is 4.2 - shouldn't a 4.5 have been created when we released 4.6, since we changed the index format? But then a 4.0/4.2 index should equally fail to be added to a 4.6, right? So maybe this doesn't explain the failure to reproduce this error...

          Show
          Shai Erera added a comment - Isn't it then a problem of anyone that will try to addIndexes a pre-4.6 index to a 4.6+? I looked at TestBackCompat.addOldIndexes - it doesn't do anything with the added index to verify it's OK. I added DirectoryReader.open(targetDir).close(); to the test after w.close() , so at least we verify the index can be opened, but the test doesn't fail, not sure why. The newest "oldNames" is 4.2 - shouldn't a 4.5 have been created when we released 4.6, since we changed the index format? But then a 4.0/4.2 index should equally fail to be added to a 4.6, right? So maybe this doesn't explain the failure to reproduce this error...
          Hide
          Robert Muir added a comment -

          First of all, some test should fail.

          IndexWriter.addIndexes(Dir) is so dangerous...

          To fix the bug, i think we have to support write logic in the old .si writer? And only allow the UOE if its really 3.x codec...

          Long term this is bogus. We should either remove IndexWriter.addIndexes(Dir) or change things so it doesnt need to write .si at all here (e.g. put partial filenames in the .si that don't include segment prefix)

          Show
          Robert Muir added a comment - First of all, some test should fail. IndexWriter.addIndexes(Dir) is so dangerous... To fix the bug, i think we have to support write logic in the old .si writer? And only allow the UOE if its really 3.x codec... Long term this is bogus. We should either remove IndexWriter.addIndexes(Dir) or change things so it doesnt need to write .si at all here (e.g. put partial filenames in the .si that don't include segment prefix)
          Hide
          Shai Erera added a comment -

          First of all, some test should fail.

          I modified TestBackCompatibility.testAddIndexes to do this:

          boolean orig = OLD_FORMAT_IMPERSONATION_IS_ACTIVE;
          OLD_FORMAT_IMPERSONATION_IS_ACTIVE = false;
          // test code goes here
          OLD_FORMAT_IMPERSONATION_IS_ACTIVE = orig;
          

          And now it fails, because writing with old Codecs is not supported. Maybe it cannot be done like that for 4x because we need to support 3x indexes, but I haven't tried (because in 4x we catch and ignore the UOE, so I don't know yet how it will affect the test).

          or change things so it doesnt need to write .si at all here (e.g. put partial filenames in the .si that don't include segment prefix)

          That's a good idea - if we strip the segment name from the files .si records (which is redundant anyway!), we won't need to rewrite .si. I believe though that we should fix this in 4.6.1?

          Show
          Shai Erera added a comment - First of all, some test should fail. I modified TestBackCompatibility.testAddIndexes to do this: boolean orig = OLD_FORMAT_IMPERSONATION_IS_ACTIVE; OLD_FORMAT_IMPERSONATION_IS_ACTIVE = false ; // test code goes here OLD_FORMAT_IMPERSONATION_IS_ACTIVE = orig; And now it fails, because writing with old Codecs is not supported. Maybe it cannot be done like that for 4x because we need to support 3x indexes, but I haven't tried (because in 4x we catch and ignore the UOE, so I don't know yet how it will affect the test). or change things so it doesnt need to write .si at all here (e.g. put partial filenames in the .si that don't include segment prefix) That's a good idea - if we strip the segment name from the files .si records (which is redundant anyway!), we won't need to rewrite .si. I believe though that we should fix this in 4.6.1?
          Hide
          Robert Muir added a comment -

          And now it fails, because writing with old Codecs is not supported. Maybe it cannot be done like that for 4x because we need to support 3x indexes, but I haven't tried (because in 4x we catch and ignore the UOE, so I don't know yet how it will affect the test).

          Oh, thanks for investigating. Now i understand how it got thru the tests. Impersonation masked the bug!!!!

          That's a good idea - if we strip the segment name from the files .si records (which is redundant anyway!), we won't need to rewrite .si. I believe though that we should fix this in 4.6.1?

          But thats the long-term fix. For the short term i think we should just allow the old .si format to be written. Very simple, very clean and conservative solution, just not elegant at all.

          Yes, this is a corruption issue. I really think we should respin 4.6.1 for this.

          Show
          Robert Muir added a comment - And now it fails, because writing with old Codecs is not supported. Maybe it cannot be done like that for 4x because we need to support 3x indexes, but I haven't tried (because in 4x we catch and ignore the UOE, so I don't know yet how it will affect the test). Oh, thanks for investigating. Now i understand how it got thru the tests. Impersonation masked the bug!!!! That's a good idea - if we strip the segment name from the files .si records (which is redundant anyway!), we won't need to rewrite .si. I believe though that we should fix this in 4.6.1? But thats the long-term fix. For the short term i think we should just allow the old .si format to be written. Very simple, very clean and conservative solution, just not elegant at all. Yes, this is a corruption issue. I really think we should respin 4.6.1 for this.
          Hide
          Shai Erera added a comment -

          For the short term i think we should just allow the old .si format to be written.

          You mean to temporarily restore Lucene40SIWriter? That will also fix it

          Show
          Shai Erera added a comment - For the short term i think we should just allow the old .si format to be written. You mean to temporarily restore Lucene40SIWriter? That will also fix it
          Hide
          Robert Muir added a comment -

          Exactly. I think this (and fixing the test bugs here), is all we should do for 4.6.1 here, lets do the segment-stripping separately?

          Show
          Robert Muir added a comment - Exactly. I think this (and fixing the test bugs here), is all we should do for 4.6.1 here, lets do the segment-stripping separately?
          Hide
          Michael McCandless added a comment -

          I think this (and fixing the test bugs here), is all we should do for 4.6.1 here, lets do the segment-stripping separately?

          +1

          Show
          Michael McCandless added a comment - I think this (and fixing the test bugs here), is all we should do for 4.6.1 here, lets do the segment-stripping separately? +1
          Hide
          Shai Erera added a comment -

          OK let's fix it separately. A new Codec ... just the thought ...

          Show
          Shai Erera added a comment - OK let's fix it separately. A new Codec ... just the thought ...
          Hide
          Robert Muir added a comment -

          the test infra is really screwed up here! There might be more hidden bugs!

          Show
          Robert Muir added a comment - the test infra is really screwed up here! There might be more hidden bugs!
          Hide
          Robert Muir added a comment -

          Here is a patch.

          Somehow, this impersonation flag dangerously got turned on by default.
          Thats the root cause of the test issue.

          Some tests that explicitly create old codecs for writing will fail with the patch. We have to fix them to use the boolean.

          Show
          Robert Muir added a comment - Here is a patch. Somehow, this impersonation flag dangerously got turned on by default. Thats the root cause of the test issue. Some tests that explicitly create old codecs for writing will fail with the patch. We have to fix them to use the boolean.
          Hide
          Shai Erera added a comment -

          I guess I messed this up when working on LUCENE-5196, turning impersonation to true by default. Sorry about that, hope it won't impact a lot of tests.

          Show
          Shai Erera added a comment - I guess I messed this up when working on LUCENE-5196 , turning impersonation to true by default. Sorry about that, hope it won't impact a lot of tests.
          Hide
          Robert Muir added a comment -

          I fixed all these tests here.

          So this patch fixes the disease (in the test infra), and fixes 4.0 SIFormat to be able to write .si just for this purpose.

          The bug does not really impact trunk, as it didnt have the dangerous 'catch UOE', on trunk the bug is the user would get an actual UOE. But this is fixed, as now the SIFormat can write.

          I think this patch is ready: for the backport we have to ensure we only "hide" that UOE if its really Lucene3x, so an instanceof is really needed there.

          Show
          Robert Muir added a comment - I fixed all these tests here. So this patch fixes the disease (in the test infra), and fixes 4.0 SIFormat to be able to write .si just for this purpose. The bug does not really impact trunk, as it didnt have the dangerous 'catch UOE', on trunk the bug is the user would get an actual UOE. But this is fixed, as now the SIFormat can write. I think this patch is ready: for the backport we have to ensure we only "hide" that UOE if its really Lucene3x, so an instanceof is really needed there.
          Hide
          Michael McCandless added a comment -

          +1, looks good.

          Show
          Michael McCandless added a comment - +1, looks good.
          Hide
          Shai Erera added a comment -

          +1, looks good!

          Show
          Shai Erera added a comment - +1, looks good!
          Hide
          Robert Muir added a comment -

          I opened a followup issue (LUCENE-5412) for the "correct" fix. But this one is too dangerous IMO for a bugfix release.

          Show
          Robert Muir added a comment - I opened a followup issue ( LUCENE-5412 ) for the "correct" fix. But this one is too dangerous IMO for a bugfix release.
          Hide
          ASF subversion and git services added a comment -

          Commit 1560401 from Robert Muir in branch 'dev/trunk'
          [ https://svn.apache.org/r1560401 ]

          LUCENE-5377: IW.addIndexes(Dir[]) causes silent index corruption

          Show
          ASF subversion and git services added a comment - Commit 1560401 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1560401 ] LUCENE-5377 : IW.addIndexes(Dir[]) causes silent index corruption
          Hide
          ASF subversion and git services added a comment -

          Commit 1560405 from Robert Muir in branch 'dev/trunk'
          [ https://svn.apache.org/r1560405 ]

          LUCENE-5377: remove dead code

          Show
          ASF subversion and git services added a comment - Commit 1560405 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1560405 ] LUCENE-5377 : remove dead code
          Hide
          Robert Muir added a comment -

          Give me some time with the backport, i dont want to introduce another bug here. Will get this all resolved in a few hours.

          Show
          Robert Muir added a comment - Give me some time with the backport, i dont want to introduce another bug here. Will get this all resolved in a few hours.
          Hide
          ASF subversion and git services added a comment -

          Commit 1560421 from Robert Muir in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1560421 ]

          LUCENE-5377: IW.addIndexes(Dir[]) causes silent index corruption

          Show
          ASF subversion and git services added a comment - Commit 1560421 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1560421 ] LUCENE-5377 : IW.addIndexes(Dir[]) causes silent index corruption
          Hide
          Robert Muir added a comment -

          I'll let Jenkins play with the 4.x backport for just a bit before merging back to the release branch.

          Show
          Robert Muir added a comment - I'll let Jenkins play with the 4.x backport for just a bit before merging back to the release branch.
          Hide
          Robert Muir added a comment -

          Here is the 4.x backport for review: http://svn.apache.org/r1560421

          I'm gonna run the tests a few times more and merge to 4.6 branch

          Show
          Robert Muir added a comment - Here is the 4.x backport for review: http://svn.apache.org/r1560421 I'm gonna run the tests a few times more and merge to 4.6 branch
          Hide
          ASF subversion and git services added a comment -

          Commit 1560477 from Robert Muir in branch 'dev/branches/lucene_solr_4_6'
          [ https://svn.apache.org/r1560477 ]

          LUCENE-5377: IW.addIndexes(Dir[]) causes silent index corruption

          Show
          ASF subversion and git services added a comment - Commit 1560477 from Robert Muir in branch 'dev/branches/lucene_solr_4_6' [ https://svn.apache.org/r1560477 ] LUCENE-5377 : IW.addIndexes(Dir[]) causes silent index corruption
          Hide
          Robert Muir added a comment -

          Thank you very much Littlestar. What a terrible bug!

          Show
          Robert Muir added a comment - Thank you very much Littlestar. What a terrible bug!
          Hide
          Littlestar added a comment -

          The patch works well for me, Thanks.

          Show
          Littlestar added a comment - The patch works well for me, Thanks.

            People

            • Assignee:
              Robert Muir
              Reporter:
              Littlestar
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development