Details

    • Type: Wish Wish
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.8, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have "join-index" - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. Shai Erera, any hint to begin with is much appreciated.

      1. LUCENE-5513.patch
        166 kB
        Shai Erera
      2. LUCENE-5513.patch
        161 kB
        Shai Erera
      3. LUCENE-5513.patch
        161 kB
        Shai Erera
      4. LUCENE-5513.patch
        101 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Mikhail Khludnev, great that you want to take a crack at that. I will help as best as I can. I would start with the high level IW.updateNumericDV API and follow the breadcrumb trail to add the BDV support. The main classes are IW, DW, BufferedDeletes, BufferedDeleteStream, NumericFieldUpdates and ReaderAndUpdates (and of course all the tests: TestNumericDocValuesUpdates and TestIndexWriterExceptions.testNoLostDeletesOrUpdates).

        At first I think it's best to just follow the NumericDV approach (which copies the entire NDV to the update file, altering the affected documents' values). We can then consider other approaches (as BinaryDV is more expensive than NumericDV to just copy around). But I'm fine if we do it in incremental steps.

        Show
        Shai Erera added a comment - Mikhail Khludnev , great that you want to take a crack at that. I will help as best as I can. I would start with the high level IW.updateNumericDV API and follow the breadcrumb trail to add the BDV support. The main classes are IW, DW, BufferedDeletes, BufferedDeleteStream, NumericFieldUpdates and ReaderAndUpdates (and of course all the tests: TestNumericDocValuesUpdates and TestIndexWriterExceptions.testNoLostDeletesOrUpdates). At first I think it's best to just follow the NumericDV approach (which copies the entire NDV to the update file, altering the affected documents' values). We can then consider other approaches (as BinaryDV is more expensive than NumericDV to just copy around). But I'm fine if we do it in incremental steps.
        Hide
        Shai Erera added a comment -

        Mikhail Khludnev, I started working on this and made some progress. But I've identified a need to do some refactoring to how the updates are represented internally today in order to keep the code (and more importantly, me!) sane. So let me know if you've started to work on it as well, so we can sync.

        Show
        Shai Erera added a comment - Mikhail Khludnev , I started working on this and made some progress. But I've identified a need to do some refactoring to how the updates are represented internally today in order to keep the code (and more importantly, me!) sane. So let me know if you've started to work on it as well, so we can sync.
        Hide
        Mikhail Khludnev added a comment -

        Shai Erera I started too. However, I can't spend much time, and have no deep understanding of the core.
        For a while I copied testSimple() from Numeric DV update, check that it's red. Now, I'm coming through layers, mostly coping Numeric DV update into Binary DV logic. So, far not so much.

        Show
        Mikhail Khludnev added a comment - Shai Erera I started too. However, I can't spend much time, and have no deep understanding of the core. For a while I copied testSimple() from Numeric DV update, check that it's red. Now, I'm coming through layers, mostly coping Numeric DV update into Binary DV logic. So, far not so much.
        Hide
        Shai Erera added a comment -

        I'll upload a patch a bit later – I've made some good progress last night and already started to cutover tests. The only thing I don't yet handle are updates that are coming in while a merge is in flight, as this requires some refactoring to the code. I will handle that too, but will upload a patch before that to checkpoint progress.

        Show
        Shai Erera added a comment - I'll upload a patch a bit later – I've made some good progress last night and already started to cutover tests. The only thing I don't yet handle are updates that are coming in while a merge is in flight, as this requires some refactoring to the code. I will handle that too, but will upload a patch before that to checkpoint progress.
        Hide
        Shai Erera added a comment -

        Patch:

        • Add IW.updateBinaryDocValue
        • Makes necessary changes to DW, BufferedUpdates(Stream), ReaderAndUpdates
        • Add new BinaryUpdate and BinaryFieldUpdates
        • Copied TestNumericDocValuesUpdates and changed to add BDV fields:
          • I still add numbers as it makes asserting easy, but I encode them as VLongs, so we get different lengths of byte[]
          • There are some tests still disabled, see below

        Patch still doesn't handle updates that came in while a merge was in flight. The reason is that the code in IW.commitMergedDeletes is hairy and adding BinaryDV updates will make it even worse. So I want to refactor how the updates are represented internally, such that there is a single class DocValuesUpdates which captures all DV updates. Since the DV fields cannot overlap (a DV field cannot be both numeric and binary), I think it will allow us to use a single UpdatesIterator from IW.commitMergedDeletes. I'll take a look at that next and re-enable the tests after this has been resolved.

        There's one thing to note – binary DV updates are more expensive to apply than NDV updates, depends on the length of the BDV. I.e. when we rewrite the DV file, then for NDV we know we write at most 8 bytes per document (compressed), but for BDV we may write a large number of bytes per document. I prefer to leave that as an optimization for later. One idea I have (which applies to NDV as well) is to do that in a sparse DV, or add "stacked" DV files. Either will make the producing code more complex, and therefore I prefer to explore that later.

        Show
        Shai Erera added a comment - Patch: Add IW.updateBinaryDocValue Makes necessary changes to DW, BufferedUpdates(Stream), ReaderAndUpdates Add new BinaryUpdate and BinaryFieldUpdates Copied TestNumericDocValuesUpdates and changed to add BDV fields: I still add numbers as it makes asserting easy, but I encode them as VLongs, so we get different lengths of byte[] There are some tests still disabled, see below Patch still doesn't handle updates that came in while a merge was in flight. The reason is that the code in IW.commitMergedDeletes is hairy and adding BinaryDV updates will make it even worse. So I want to refactor how the updates are represented internally, such that there is a single class DocValuesUpdates which captures all DV updates. Since the DV fields cannot overlap (a DV field cannot be both numeric and binary), I think it will allow us to use a single UpdatesIterator from IW.commitMergedDeletes. I'll take a look at that next and re-enable the tests after this has been resolved. There's one thing to note – binary DV updates are more expensive to apply than NDV updates, depends on the length of the BDV. I.e. when we rewrite the DV file, then for NDV we know we write at most 8 bytes per document (compressed), but for BDV we may write a large number of bytes per document. I prefer to leave that as an optimization for later. One idea I have (which applies to NDV as well) is to do that in a sparse DV, or add "stacked" DV files. Either will make the producing code more complex, and therefore I prefer to explore that later.
        Hide
        Michael McCandless added a comment -

        Looks good Shai!

        I agree we need a refactoring of commitMergedDeletes, and we shouldn't worry about stacking/optimizing now.

        Show
        Michael McCandless added a comment - Looks good Shai! I agree we need a refactoring of commitMergedDeletes, and we shouldn't worry about stacking/optimizing now.
        Hide
        Shai Erera added a comment -

        Patch makes the following refactoring changes (all internal API):

        • DocValuesUpdate abstract class w/ common implementation for NumericDocValuesUpdate and BinaryDocValuesUpdate.
        • DocValuesFieldUpdates hold the doc+updates for a single field. It mostly defines the API for the Numeric* and Binary* implementations.
        • DocValuesFieldUpdates.Container holds numeric+binary updates for a set of fields. It is as its name says – a container of updates used by ReaderAndUpdates.
          • It helps not bloat the API w/ more maps being passed as well as simplified BufferedUpdatesStream and IndexWriter.commitMergedDeletes.
          • It also serves as a factory method based on the updates Type
        • Finished TestBinaryDVUpdates
        • Added TestMixedDVUpdates which ports some of the 'big' tests from both TestNDV/BDVUpdates and mixes some NDV and BDV updates.
          • I'll beast it some to make sure all edge cases are covered.

        I may take a crack at simplifying IW.commitMergedDeletes even more by pulling a lot of duplicate code into a method. This is impossible now because those sections modify more than one state variables, but I'll try to stuff these variables in a container to make this method more sane to read.

        Otherwise, I think it's ready.

        Show
        Shai Erera added a comment - Patch makes the following refactoring changes (all internal API): DocValuesUpdate abstract class w/ common implementation for NumericDocValuesUpdate and BinaryDocValuesUpdate. DocValuesFieldUpdates hold the doc+updates for a single field. It mostly defines the API for the Numeric* and Binary* implementations. DocValuesFieldUpdates.Container holds numeric+binary updates for a set of fields. It is as its name says – a container of updates used by ReaderAndUpdates. It helps not bloat the API w/ more maps being passed as well as simplified BufferedUpdatesStream and IndexWriter.commitMergedDeletes. It also serves as a factory method based on the updates Type Finished TestBinaryDVUpdates Added TestMixedDVUpdates which ports some of the 'big' tests from both TestNDV/BDVUpdates and mixes some NDV and BDV updates. I'll beast it some to make sure all edge cases are covered. I may take a crack at simplifying IW.commitMergedDeletes even more by pulling a lot of duplicate code into a method. This is impossible now because those sections modify more than one state variables, but I'll try to stuff these variables in a container to make this method more sane to read. Otherwise, I think it's ready.
        Hide
        Shai Erera added a comment - - edited

        Fixed silly bug in BinaryDocValuesFieldUpdates.merge().

        Show
        Shai Erera added a comment - - edited Fixed silly bug in BinaryDocValuesFieldUpdates.merge().
        Hide
        Michael McCandless added a comment -

        Hmm, on beasting (TestSortingMergePolicy.testSortingMP -seed 2B748835E48BB14A -jvms 6 -noc) I hit this failure:

        .mar 17, 2014 5:41:41 EM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
        VARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #8,6,TGRP-TestSortingMergePolicy]
        org.apache.lucene.index.MergePolicy$MergeException: java.lang.NullPointerException
        	at __randomizedtesting.SeedInfo.seed([2B748835E48BB14A]:0)
        	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
        	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
        Caused by: java.lang.NullPointerException
        	at org.apache.lucene.index.IndexWriter.commitMergedDeletesAndUpdates(IndexWriter.java:3468)
        	at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3530)
        	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4222)
        	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3679)
        	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
        	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
        
        EENOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory.
        NOTE: reproduce with: ant test  -Dtestcase=TestSortingMergePolicy -Dtests.method=testSortingMP -Dtests.seed=2B748835E48BB14A -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt -Dtests.locale=sv_SE -Dtests.timezone=Africa/Malabo -Dtests.file.encoding=UTF-8
        NOTE: test params are: codec=Lucene46: {s=PostingsFormat(name=FSTPulsing41)}, docValues:{ndv=DocValuesFormat(name=SimpleText)}, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=sv_SE, timezone=Africa/Malabo
        NOTE: Linux 3.5.0-47-generic amd64/Oracle Corporation 1.7.0_60-ea (64-bit)/cpus=8,threads=1,free=401611472,total=515375104
        NOTE: All tests run in this JVM: [TestSortingMergePolicy]
        
        Time: 2.051
        There were 2 failures:
        1) testSortingMP(org.apache.lucene.index.sorter.TestSortingMergePolicy)
        java.lang.AssertionError: ndv(89)=8960834324998763998,ndv(90)=-6235091358187467651
        	at org.junit.Assert.fail(Assert.java:93)
        	at org.junit.Assert.assertTrue(Assert.java:43)
        	at org.apache.lucene.index.sorter.TestSortingMergePolicy.assertSorted(TestSortingMergePolicy.java:162)
        	at org.apache.lucene.index.sorter.TestSortingMergePolicy.testSortingMP(TestSortingMergePolicy.java:171)
        	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        	at java.lang.reflect.Method.invoke(Method.java:606)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
        	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
        	at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
        	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
        	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
        	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
        	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
        	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
        	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
        	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
        	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
        	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
        	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
        	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
        	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
        	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
        	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
        	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
        	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
        	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
        	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
        	at java.lang.Thread.run(Thread.java:744)
        2) testSortingMP(org.apache.lucene.index.sorter.TestSortingMergePolicy)
        com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=21, name=Lucene Merge Thread #8, state=RUNNABLE, group=TGRP-TestSortingMergePolicy]
        Caused by: org.apache.lucene.index.MergePolicy$MergeException: java.lang.NullPointerException
        	at __randomizedtesting.SeedInfo.seed([2B748835E48BB14A]:0)
        	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545)
        	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
        Caused by: java.lang.NullPointerException
        	at org.apache.lucene.index.IndexWriter.commitMergedDeletesAndUpdates(IndexWriter.java:3468)
        	at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3530)
        	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4222)
        	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3679)
        	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
        	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
        
        Show
        Michael McCandless added a comment - Hmm, on beasting (TestSortingMergePolicy.testSortingMP -seed 2B748835E48BB14A -jvms 6 -noc) I hit this failure: .mar 17, 2014 5:41:41 EM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException VARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #8,6,TGRP-TestSortingMergePolicy] org.apache.lucene.index.MergePolicy$MergeException: java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([2B748835E48BB14A]:0) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518) Caused by: java.lang.NullPointerException at org.apache.lucene.index.IndexWriter.commitMergedDeletesAndUpdates(IndexWriter.java:3468) at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3530) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4222) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3679) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) EENOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory. NOTE: reproduce with: ant test -Dtestcase=TestSortingMergePolicy -Dtests.method=testSortingMP -Dtests.seed=2B748835E48BB14A -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt -Dtests.locale=sv_SE -Dtests.timezone=Africa/Malabo -Dtests.file.encoding=UTF-8 NOTE: test params are: codec=Lucene46: {s=PostingsFormat(name=FSTPulsing41)}, docValues:{ndv=DocValuesFormat(name=SimpleText)}, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=sv_SE, timezone=Africa/Malabo NOTE: Linux 3.5.0-47-generic amd64/Oracle Corporation 1.7.0_60-ea (64-bit)/cpus=8,threads=1,free=401611472,total=515375104 NOTE: All tests run in this JVM: [TestSortingMergePolicy] Time: 2.051 There were 2 failures: 1) testSortingMP(org.apache.lucene.index.sorter.TestSortingMergePolicy) java.lang.AssertionError: ndv(89)=8960834324998763998,ndv(90)=-6235091358187467651 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.lucene.index.sorter.TestSortingMergePolicy.assertSorted(TestSortingMergePolicy.java:162) at org.apache.lucene.index.sorter.TestSortingMergePolicy.testSortingMP(TestSortingMergePolicy.java:171) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at java.lang.Thread.run(Thread.java:744) 2) testSortingMP(org.apache.lucene.index.sorter.TestSortingMergePolicy) com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=21, name=Lucene Merge Thread #8, state=RUNNABLE, group=TGRP-TestSortingMergePolicy] Caused by: org.apache.lucene.index.MergePolicy$MergeException: java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([2B748835E48BB14A]:0) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518) Caused by: java.lang.NullPointerException at org.apache.lucene.index.IndexWriter.commitMergedDeletesAndUpdates(IndexWriter.java:3468) at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3530) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4222) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3679) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
        Hide
        Shai Erera added a comment -

        Thanks Mike. In an edge case where there are field updates, but also deletes, such that all of the updated documents were deleted, I created the DVFUpdates instances prematurely, leading to the NPE. Patch fixes this as well as integrated BDV updates in TestIWExceptions.testNoLostDeletesOrUpdates.

        Show
        Shai Erera added a comment - Thanks Mike. In an edge case where there are field updates, but also deletes, such that all of the updated documents were deleted, I created the DVFUpdates instances prematurely, leading to the NPE. Patch fixes this as well as integrated BDV updates in TestIWExceptions.testNoLostDeletesOrUpdates.
        Hide
        Michael McCandless added a comment -

        +1 to commit... patch looks great. And beasting isn't uncovering any more failures...

        Show
        Michael McCandless added a comment - +1 to commit... patch looks great. And beasting isn't uncovering any more failures...
        Hide
        Shai Erera added a comment -

        Thanks Mike, will wrap up and commit.

        One thing I wanted to note, and specifically emphasized in javadocs, is that IW.updateBinaryDocValue replaces the existing byte[] value of all affected documents. We could also easily implement an append type of update, where the given bytes are appended to all affected documents. It's only a matter of defining that on the update itself and in ReaderAndUpdates, instead of overriding a document's value, we read its current value from the reader and append the new bytes.

        Unlike NDV updates, append for Binary (and SortedSet) has more value, since it lets you add values to documents whose existing values may not be currently identical, where the current implementation ignores all existing values and makes all affected documents identical. Perhaps it's acceptable, depending on the nature of the update (e.g. update by PK), but I think we should explore adding update capabilities to Binary and SortedSet DV. And also the IW.update API to allow updating by more than just Term, e.g. this thread: http://markmail.org/message/2wmpvksuwc5t57pg.

        These are all for separate issues though.

        Show
        Shai Erera added a comment - Thanks Mike, will wrap up and commit. One thing I wanted to note, and specifically emphasized in javadocs, is that IW.updateBinaryDocValue replaces the existing byte[] value of all affected documents. We could also easily implement an append type of update, where the given bytes are appended to all affected documents. It's only a matter of defining that on the update itself and in ReaderAndUpdates, instead of overriding a document's value, we read its current value from the reader and append the new bytes. Unlike NDV updates, append for Binary (and SortedSet) has more value, since it lets you add values to documents whose existing values may not be currently identical, where the current implementation ignores all existing values and makes all affected documents identical. Perhaps it's acceptable, depending on the nature of the update (e.g. update by PK), but I think we should explore adding update capabilities to Binary and SortedSet DV. And also the IW.update API to allow updating by more than just Term, e.g. this thread: http://markmail.org/message/2wmpvksuwc5t57pg . These are all for separate issues though.
        Hide
        ASF subversion and git services added a comment -

        Commit 1578784 from Shai Erera in branch 'dev/trunk'
        [ https://svn.apache.org/r1578784 ]

        LUCENE-5513: add IndexWriter.updateBinaryDocValue

        Show
        ASF subversion and git services added a comment - Commit 1578784 from Shai Erera in branch 'dev/trunk' [ https://svn.apache.org/r1578784 ] LUCENE-5513 : add IndexWriter.updateBinaryDocValue
        Hide
        ASF subversion and git services added a comment -

        Commit 1578790 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1578790 ]

        LUCENE-5513: add IndexWriter.updateBinaryDocValue

        Show
        ASF subversion and git services added a comment - Commit 1578790 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578790 ] LUCENE-5513 : add IndexWriter.updateBinaryDocValue
        Hide
        Shai Erera added a comment -

        Committed to trunk and 4x. Thanks Mike for the review!

        Show
        Shai Erera added a comment - Committed to trunk and 4x. Thanks Mike for the review!
        Hide
        ASF subversion and git services added a comment -

        Commit 1578803 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1578803 ]

        LUCENE-5513: suppress codecs in test

        Show
        ASF subversion and git services added a comment - Commit 1578803 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578803 ] LUCENE-5513 : suppress codecs in test
        Hide
        ASF subversion and git services added a comment -

        Commit 1578831 from Shai Erera in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1578831 ]

        LUCENE-5513: fix bad svn merge

        Show
        ASF subversion and git services added a comment - Commit 1578831 from Shai Erera in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578831 ] LUCENE-5513 : fix bad svn merge
        Hide
        Uwe Schindler added a comment -

        Close issue after release of 4.8.0

        Show
        Uwe Schindler added a comment - Close issue after release of 4.8.0

          People

          • Assignee:
            Shai Erera
            Reporter:
            Mikhail Khludnev
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development