Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-1104

SegmentNodeStore rebase operation assumes wrong child node order

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.10
    • 0.11
    • core, segmentmk
    • None

    Description

      This popped up during the async merge process. The merge first does a rebase which can fail, making some index files look like they disappeared [0], wrapping the actual root cause.

      The problem is that the rebase failed and removed the missing file. This can be seen by analyzing the ':conflict' marker info:

      addExistingNode {_b_Lucene41_0.doc, _b.fdx, _b.fdt, _b_4.del, }

      so it points to something trying to add some index related files twice, almost like a concurrent commit exception.

      Digging even deeper I found that the rebase operation during the state comparison phase assumes a certain order of child nodes [1], and based on that tries to read the mentioned nodes again, thinking that they are new ones, when if fact they are already present in the list [2].
      This causes a conflict which fails the entire async update process, but also any lucene search, as the index files are now gone and the index is in a corrupted state.

      [0]

      *WARN* [pool-5-thread-2] org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate Index update async failed org.apache.jackrabbit.oak.api.CommitFailedException: OakLucene0004: Failed to close the Lucene index
      	at org.apache.jackrabbit.oak.plugins.index.lucene.LuceneIndexEditor.leave(LuceneIndexEditor.java:122)
      	at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:64)
      	at org.apache.jackrabbit.oak.spi.commit.VisibleEditor.leave(VisibleEditor.java:64)
      	at org.apache.jackrabbit.oak.plugins.index.IndexUpdate.leave(IndexUpdate.java:129)
      	at org.apache.jackrabbit.oak.spi.commit.EditorDiff.process(EditorDiff.java:56)
      	at org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:100)
      	at org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:105)
      	at org.quartz.core.JobRunShell.run(JobRunShell.java:207)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:724)
      Caused by: java.io.FileNotFoundException: _b_Lucene41_0.doc at org.apache.jackrabbit.oak.plugins.index.lucene.OakDirectory.openInput(OakDirectory.java:145)
      

      [1] http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/segment/MapRecord.java?view=markup#l329

      [2]
      before child list

      [_b_Lucene41_0.doc, _b.fdx, _b.fdt, segments_34, _b_4.del, _b_Lucene41_0.pos, _b.nvm, _b.nvd, _b.fnm, _3n.si, _b_Lucene41_0.tip, _b_Lucene41_0.tim, _3n.cfe, segments.gen, _3n.cfs, _b.si]
      

      after list

      _b_Lucene41_0.pos, _3k.cfs, _3j_1.del, _b.nvm, _b.nvd, _3d.cfe, _3d.cfs, _b.fnm, _3j.si, _3h.si, _3i.cfe, _3i.cfs, _3e_2.del, _3f.si, _b_Lucene41_0.tip, _b_Lucene41_0.tim, segments.gen, _3e.cfe, _3e.cfs, _b.si,_3g.si, _3l.si, _3i_1.del, _3d_3.del, _3e.si, _3d.si, _b_Lucene41_0.doc, _3h_2.del, _3i.si, _3k_1.del, _3j.cfe, _3j.cfs, _b.fdx, _b.fdt, _3g_1.del, _3k.si, _3l.cfe, _3l.cfs, segments_33, _3f_1.del, _3h.cfe, _3h.cfs, _b_4.del, _3f.cfe, _3f.cfs, _3g.cfe, _3g.cfs
      

      Attachments

        Activity

          People

            jukkaz Jukka Zitting
            stillalex Alex Deparvu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: