Lucene - Core
  1. Lucene - Core
  2. LUCENE-6205

DV updates can hit FileNotFoundException due to concurrency bug

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.10.4, 5.0, 5.1, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Jenkins has hit this a few times recently, e.g.:

       [junit4] Suite: org.apache.lucene.index.TestBinaryDocValuesUpdates
         [junit4]   2> Jan 28, 2015 11:49:24 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
         [junit4]   2> WARNUNG: Uncaught exception in thread: Thread[Lucene Merge Thread #1,5,TGRP-TestBinaryDocValuesUpdates]
         [junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: java.nio.file.NoSuchFileException: _4_1.fnm in dir=RAMDirectory@5dcf7f8a lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ccb4148
         [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([5EC20FA2CD1E68B8]:0)
         [junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:641)
         [junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:609)
         [junit4]   2> Caused by: java.nio.file.NoSuchFileException: _4_1.fnm in dir=RAMDirectory@5dcf7f8a lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ccb4148
         [junit4]   2> 	at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:655)
         [junit4]   2> 	at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110)
         [junit4]   2> 	at org.apache.lucene.codecs.lucene50.Lucene50FieldInfosFormat.read(Lucene50FieldInfosFormat.java:113)
         [junit4]   2> 	at org.apache.lucene.index.SegmentReader.initFieldInfos(SegmentReader.java:155)
         [junit4]   2> 	at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:119)
         [junit4]   2> 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3935)
         [junit4]   2> 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3559)
         [junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:549)
         [junit4]   2> 	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:587)
         [junit4]   2> 
         [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestBinaryDocValuesUpdates -Dtests.method=testManyReopensAndFields -Dtests.seed=5EC20FA2CD1E68B8 -Dtests.slow=true -Dtests.locale=de_DE -Dtests.timezone=Europe/Samara -Dtests.asserts=true -Dtests.file.encoding=UTF-8
      

      It repros only after substantial beasting.

      It's a concurrency issue between one thread kicking off a merge, and another thread resolving doc values updates.

      1. LUCENE-6205.patch
        1.0 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Simple patch.

        In this part of IW, the merge thread is opening a new SegmentReader, just carrying over new in-RAM deletions that happened since the reader was last refreshed.

        However, that ctor in SegmentReader also opens any new doc values, which BufferedUpdateStream may be in the process of writing (from another thread).

        So opening this new SR must also hold IW's monitor lock.

        I beasted over 5K iterations with this and no failure; without the patch it fails after a few hundred iterations usually ...

        Show
        Michael McCandless added a comment - Simple patch. In this part of IW, the merge thread is opening a new SegmentReader, just carrying over new in-RAM deletions that happened since the reader was last refreshed. However, that ctor in SegmentReader also opens any new doc values, which BufferedUpdateStream may be in the process of writing (from another thread). So opening this new SR must also hold IW's monitor lock. I beasted over 5K iterations with this and no failure; without the patch it fails after a few hundred iterations usually ...
        Hide
        Robert Muir added a comment -

        +1

        Show
        Robert Muir added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1655423 from Michael McCandless in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1655423 ]

        LUCENE-6205: don't let doc values updates write in one thread at the same time as a merge kicking off in another

        Show
        ASF subversion and git services added a comment - Commit 1655423 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1655423 ] LUCENE-6205 : don't let doc values updates write in one thread at the same time as a merge kicking off in another
        Hide
        ASF subversion and git services added a comment -

        Commit 1655424 from Michael McCandless in branch 'dev/branches/lucene_solr_5_0'
        [ https://svn.apache.org/r1655424 ]

        LUCENE-6205: don't let doc values updates write in one thread at the same time as a merge kicking off in another

        Show
        ASF subversion and git services added a comment - Commit 1655424 from Michael McCandless in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1655424 ] LUCENE-6205 : don't let doc values updates write in one thread at the same time as a merge kicking off in another
        Hide
        ASF subversion and git services added a comment -

        Commit 1655426 from Michael McCandless in branch 'dev/trunk'
        [ https://svn.apache.org/r1655426 ]

        LUCENE-6205: don't let doc values updates write in one thread at the same time as a merge kicking off in another

        Show
        ASF subversion and git services added a comment - Commit 1655426 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1655426 ] LUCENE-6205 : don't let doc values updates write in one thread at the same time as a merge kicking off in another
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.
        Hide
        Michael McCandless added a comment -

        Reopen for backport to 4.10.4

        Show
        Michael McCandless added a comment - Reopen for backport to 4.10.4
        Hide
        ASF subversion and git services added a comment -

        Commit 1662188 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1662188 ]

        LUCENE-6205: don't let doc values updates write in one thread at the same time as a merge kicking off in another

        Show
        ASF subversion and git services added a comment - Commit 1662188 from Michael McCandless in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1662188 ] LUCENE-6205 : don't let doc values updates write in one thread at the same time as a merge kicking off in another
        Hide
        Michael McCandless added a comment -

        Bulk close for 4.10.4 release

        Show
        Michael McCandless added a comment - Bulk close for 4.10.4 release

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development