Lucene - Core
  1. Lucene - Core
  2. LUCENE-3188

The class from cotrub directory org.apache.lucene.index.IndexSplitter creates a non correct index

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 3.0, 3.2
    • Fix Version/s: 3.3, 4.0-ALPHA
    • Component/s: modules/other
    • Labels:
      None
    • Environment:

      Bug is present for all environments.
      I used in this case - Windows Server 2003, Java Hot Spot Virtual Machine.

    • Lucene Fields:
      New, Patch Available

      Description

      When using the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index.
      If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.

      1. LUCENE-3188-fix1.patch
        0.5 kB
        Uwe Schindler
      2. LUCENE-3188.patch
        6 kB
        Steve Rowe
      3. LUCENE-3188.patch
        4 kB
        Steve Rowe
      4. LUCENE-3188.patch
        0.9 kB
        Ivan Dimitrov Vasilev
      5. IndexSplitter.java
        6 kB
        Ivan Dimitrov Vasilev
      6. TestIndexSplitter.java
        3 kB
        Ivan Dimitrov Vasilev

        Activity

        Hide
        Ivan Dimitrov Vasilev added a comment -

        The attached file TestIndexSplitter.java contains test that shows the bug (when running IndexSplitter from contrib) and the fix (when running IndexSplitter that is attached here as a patch)

        Show
        Ivan Dimitrov Vasilev added a comment - The attached file TestIndexSplitter.java contains test that shows the bug (when running IndexSplitter from contrib) and the fix (when running IndexSplitter that is attached here as a patch)
        Hide
        Steve Rowe added a comment -

        Hi Ivan,

        Your submissions should be in the form of a patch (for an explanation see e.g. http://en.wikipedia.org/wiki/Patch_%28computing%29). To generate a patch, after you make modifications in a locally checked-out Subversion working copy, use the shell command "svn diff" at the top level, and redirect its output to a file named for the issue you want to attach to, with extension ".patch", e.g.: svn diff > ../LUCENE-3188.patch.

        Also, when you attached the two files to this issue, you did not click on the radio button next to the text "Grant license to ASF for inclusion in ASF works (as per the Apache License §5)". You must do this for the Lucene project to be able to use code you contribute. When you attach your patch, please click on the radio button indicating you grant license to the ASF. (I haven't looked at your code yet for this reason.)

        Steve

        Show
        Steve Rowe added a comment - Hi Ivan, Your submissions should be in the form of a patch (for an explanation see e.g. http://en.wikipedia.org/wiki/Patch_%28computing%29 ). To generate a patch, after you make modifications in a locally checked-out Subversion working copy, use the shell command "svn diff" at the top level, and redirect its output to a file named for the issue you want to attach to, with extension ".patch", e.g.: svn diff > ../ LUCENE-3188 .patch . Also, when you attached the two files to this issue, you did not click on the radio button next to the text "Grant license to ASF for inclusion in ASF works (as per the Apache License §5)". You must do this for the Lucene project to be able to use code you contribute. When you attach your patch, please click on the radio button indicating you grant license to the ASF. (I haven't looked at your code yet for this reason.) Steve
        Hide
        Steve Rowe added a comment -

        This is a better Wikipedia article on (source code) patching than the one I gave above: http://en.wikipedia.org/wiki/Patch_%28Unix%29

        Show
        Steve Rowe added a comment - This is a better Wikipedia article on (source code) patching than the one I gave above: http://en.wikipedia.org/wiki/Patch_%28Unix%29
        Hide
        Ivan Dimitrov Vasilev added a comment -

        The file LUCENE-3188.patch contains the needed changes to IndexSplitter to fix this issue.

        Show
        Ivan Dimitrov Vasilev added a comment - The file LUCENE-3188 .patch contains the needed changes to IndexSplitter to fix this issue.
        Hide
        Ivan Dimitrov Vasilev added a comment -

        Hi Steve,

        I attached the patch to this issue as required from Apache (or at least I think so ).
        I do not have lot of time now to read in depth Apache docs about the procedures when contributing but I saw on the wiki that I should provide also test cases that show the bug and they should be in form of unit tests. I provided a test case that is not in JUnit form but still works. As I saw when submitted patch the tests do not need granting license so you can use it. I do not have much time now because we are before release and this was one of the bugs that I should fix (and I did it).

        I guess you are the one who discovered this Splitter. Thank you very much for this you saved me a lot of hard work because in our previous releases we used a class that generated segments descriptor file out of given segments and looking for content of this file was very difficult.

        Show
        Ivan Dimitrov Vasilev added a comment - Hi Steve, I attached the patch to this issue as required from Apache (or at least I think so ). I do not have lot of time now to read in depth Apache docs about the procedures when contributing but I saw on the wiki that I should provide also test cases that show the bug and they should be in form of unit tests. I provided a test case that is not in JUnit form but still works. As I saw when submitted patch the tests do not need granting license so you can use it. I do not have much time now because we are before release and this was one of the bugs that I should fix (and I did it). I guess you are the one who discovered this Splitter. Thank you very much for this you saved me a lot of hard work because in our previous releases we used a class that generated segments descriptor file out of given segments and looking for content of this file was very difficult.
        Hide
        Steve Rowe added a comment -

        I attached the patch to this issue as required from Apache (or at least I think so ).
        I do not have lot of time now to read in depth Apache docs about the procedures when contributing but I saw on the wiki that I should provide also test cases that show the bug and they should be in form of unit tests. I provided a test case that is not in JUnit form but still works. As I saw when submitted patch the tests do not need granting license so you can use it. I do not have much time now because we are before release and this was one of the bugs that I should fix (and I did it).

        Thanks for reporting and providing a patch. I can take it from here.

        I guess you are the one who discovered this Splitter.

        I think you have me confused with someone else - Jason Rutherglen wrote it: LUCENE-1959.

        Show
        Steve Rowe added a comment - I attached the patch to this issue as required from Apache (or at least I think so ). I do not have lot of time now to read in depth Apache docs about the procedures when contributing but I saw on the wiki that I should provide also test cases that show the bug and they should be in form of unit tests. I provided a test case that is not in JUnit form but still works. As I saw when submitted patch the tests do not need granting license so you can use it. I do not have much time now because we are before release and this was one of the bugs that I should fix (and I did it). Thanks for reporting and providing a patch. I can take it from here. I guess you are the one who discovered this Splitter. I think you have me confused with someone else - Jason Rutherglen wrote it: LUCENE-1959 .
        Hide
        Steve Rowe added a comment -

        Patch against branch_3x.

        I converted Ivan's test class into a unit test. Without Ivan's patch, the test fails, and with the patch, it succeeds.

        Here's the test failure I got without Ivan's patch:

        org.apache.lucene.index.TestIndexSplitter,testDeleteThenOptimize
        NOTE: reproduce with: ant test -Dtestcase=TestIndexSplitter -Dtestmethod=testDeleteThenOptimize -Dtests.seed=5250008618328265481:-4070453331991284264
        WARNING: test class left thread running: merge thread: _0(3.3):c2/1 into _0 [optimize]
        RESOURCE LEAK: test class left 1 thread(s) running
        Exception in thread "Lucene Merge Thread #0" NOTE: test params are: locale=es_BO, timezone=Australia/Tasmania
        org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted
        	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:515)
        	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
        Caused by: java.lang.InterruptedException: sleep interrupted
        	at java.lang.Thread.sleep(Native Method)
        	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:513)
        	... 1 more
        NOTE: all tests run in this JVM:
        [TestIndexSplitter]
        NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.5.0_22 (64-bit)/cpus=4,threads=2,free=99874080,total=128057344
        
        java.io.IOException: background merge hit exception: _0(3.3):c2/1 into _0 [optimize]
        	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2536)
        	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2474)
        	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2444)
        	at org.apache.lucene.index.TestIndexSplitter.testDeleteThenOptimize(TestIndexSplitter.java:145)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
        	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
        	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
        	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
        	at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48)
        	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
        	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
        	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
        	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1268)
        	at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1186)
        	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
        	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
        	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
        	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
        	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
        	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
        	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
        	at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
        	at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
        	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:94)
        	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192)
        	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
        Caused by: java.io.IOException: MockDirectoryWrapper: file "_0.cfs" is still open: cannot overwrite
        	at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:360)
        	at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:167)
        	at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:137)
        	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4242)
        	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3853)
        	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
        	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
        
        Show
        Steve Rowe added a comment - Patch against branch_3x. I converted Ivan's test class into a unit test. Without Ivan's patch, the test fails, and with the patch, it succeeds. Here's the test failure I got without Ivan's patch: org.apache.lucene.index.TestIndexSplitter,testDeleteThenOptimize NOTE: reproduce with: ant test -Dtestcase=TestIndexSplitter -Dtestmethod=testDeleteThenOptimize -Dtests.seed=5250008618328265481:-4070453331991284264 WARNING: test class left thread running: merge thread: _0(3.3):c2/1 into _0 [optimize] RESOURCE LEAK: test class left 1 thread(s) running Exception in thread "Lucene Merge Thread #0" NOTE: test params are: locale=es_BO, timezone=Australia/Tasmania org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:515) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:513) ... 1 more NOTE: all tests run in this JVM: [TestIndexSplitter] NOTE: Windows 7 6.1 amd64/Sun Microsystems Inc. 1.5.0_22 (64-bit)/cpus=4,threads=2,free=99874080,total=128057344 java.io.IOException: background merge hit exception: _0(3.3):c2/1 into _0 [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2536) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2474) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2444) at org.apache.lucene.index.TestIndexSplitter.testDeleteThenOptimize(TestIndexSplitter.java:145) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.rules.TestWatchman$1.evaluate(TestWatchman.java:48) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1268) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1186) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runner.JUnitCore.run(JUnitCore.java:157) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:94) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:192) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115) Caused by: java.io.IOException: MockDirectoryWrapper: file "_0.cfs" is still open: cannot overwrite at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:360) at org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:167) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:137) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4242) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3853) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
        Hide
        Steve Rowe added a comment -

        Mike McCandless reviewed for me; see the IRC log.

        In summary: This is a bug, and the fix is appropriate. The problem is that the split index is created with an incorrect next-segment-name counter (always 0). A simpler fix than Ivan's would be to copy the counter over from the source index. Also, it would be good to include a check for this problem to the CheckIndex tool:

        mikemccand: ie it'd just verify sis.counter is > all segs in the index

        I'll make a patch with these changes.

        Show
        Steve Rowe added a comment - Mike McCandless reviewed for me; see the IRC log . In summary: This is a bug, and the fix is appropriate. The problem is that the split index is created with an incorrect next-segment-name counter (always 0). A simpler fix than Ivan's would be to copy the counter over from the source index. Also, it would be good to include a check for this problem to the CheckIndex tool: mikemccand: ie it'd just verify sis.counter is > all segs in the index I'll make a patch with these changes.
        Hide
        Steve Rowe added a comment -

        Attachment with above-described changes.

        I plan on committing shortly, then porting to trunk.

        Show
        Steve Rowe added a comment - Attachment with above-described changes. I plan on committing shortly, then porting to trunk.
        Hide
        Steve Rowe added a comment -

        Committed:

        • r1134823: branch_3x
        • r1134829: trunk

        Thanks Ivan!

        Show
        Steve Rowe added a comment - Committed: r1134823: branch_3x r1134829: trunk Thanks Ivan!
        Hide
        Uwe Schindler added a comment -

        Reopen: The CheckIndex patch reports broken index if you have empty index without segments.

        Show
        Uwe Schindler added a comment - Reopen: The CheckIndex patch reports broken index if you have empty index without segments.
        Hide
        Uwe Schindler added a comment -

        Fix for the CheckIndex problem with empty indexes.

        Show
        Uwe Schindler added a comment - Fix for the CheckIndex problem with empty indexes.
        Hide
        Uwe Schindler added a comment -

        Committed fixes in revisions: 1134895 (trunk), 1134896 (3.x)

        Show
        Uwe Schindler added a comment - Committed fixes in revisions: 1134895 (trunk), 1134896 (3.x)
        Hide
        Steve Rowe added a comment -

        Thanks Uwe. I didn't see the problem because I didn't run all tests before committing.

        Show
        Steve Rowe added a comment - Thanks Uwe. I didn't see the problem because I didn't run all tests before committing.
        Hide
        Robert Muir added a comment -

        bulk close for 3.3

        Show
        Robert Muir added a comment - bulk close for 3.3

          People

          • Assignee:
            Steve Rowe
            Reporter:
            Ivan Dimitrov Vasilev
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development