Lucene - Core
  1. Lucene - Core
  2. LUCENE-2584

Concurrency issues in SegmentInfo.files() could lead to ConcurrentModificationException

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9, 2.9.1, 2.9.2, 2.9.3, 3.0, 3.0.1, 3.0.2
    • Fix Version/s: 2.9.5, 3.0.4, 3.1
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The multi-threaded call of the files() in SegmentInfo could lead to the ConcurrentModificationException if one thread is not finished additions to the ArrayList (files) yet while the other thread already obtained it as cached (see below). This is a rare exception, but it would be nice to fix. I see the code is no longer problematic in the trunk (and others ported from flex_1458), looks it was fixed while implementing post 3.x features. The fix to 3.x and 2.9.x branches could be the same - create the files set first and populate it, and then assign to the member variable at the end of the method. This will resolve the issue. I could prepare the patch for 2.9.4 and 3.x, if needed.

      INFO: [19] webapp= path=/replication params=

      {command=fetchindex&wt=javabin}

      status=0 QTime=1
      Jul 30, 2010 9:13:05 AM org.apache.solr.core.SolrCore execute
      INFO: [19] webapp= path=/replication params=

      {command=details&wt=javabin}

      status=0 QTime=24
      Jul 30, 2010 9:13:05 AM org.apache.solr.handler.ReplicationHandler doFetch
      SEVERE: SnapPull failed
      java.util.ConcurrentModificationException
      at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
      at java.util.AbstractList$Itr.next(AbstractList.java:343)
      at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
      at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:826)
      at org.apache.lucene.index.DirectoryReader$ReaderCommit.<init>(DirectoryReader.java:916)
      at org.apache.lucene.index.DirectoryReader.getIndexCommit(DirectoryReader.java:856)
      at org.apache.solr.search.SolrIndexReader.getIndexCommit(SolrIndexReader.java:454)
      at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:261)
      at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:264)
      at org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:146)

      1. LUCENE-2584.patch
        9 kB
        Shai Erera
      2. LUCENE-2584-branch_3x.patch
        4 kB
        Alexander Kanarsky
      3. LUCENE-2584-lucene-2_9.patch
        4 kB
        Alexander Kanarsky
      4. LUCENE-2584-lucene-3_0.patch
        4 kB
        Alexander Kanarsky

        Activity

        Hide
        Michael McCandless added a comment -

        Ahh great catch! Yes please create a patch? Thanks!

        Show
        Michael McCandless added a comment - Ahh great catch! Yes please create a patch? Thanks!
        Hide
        Alexander Kanarsky added a comment -

        OK, I'll create patches tomorrow.

        Show
        Alexander Kanarsky added a comment - OK, I'll create patches tomorrow.
        Hide
        Alexander Kanarsky added a comment -

        Mike, the patches are attached. One note, I noticed (in flex_1458/trunk) that you are using the HashSet to collect the file names and then convert it to ArrayList; while this is a good idea to guarantee the uniqueness of the file names, it also could mask any code that accidentally adds the same file more than once (something that I would prefer to notice rather than mask). So I used an assertion to ensure both the uniqueness of the names and correctness of the code that adds names. Also, with assertions disabled, this approach adds no additional performance overhead at all. But if you'd like to use the HashSet to collect the file names, let me know so I will redo the patches.

        Show
        Alexander Kanarsky added a comment - Mike, the patches are attached. One note, I noticed (in flex_1458/trunk) that you are using the HashSet to collect the file names and then convert it to ArrayList; while this is a good idea to guarantee the uniqueness of the file names, it also could mask any code that accidentally adds the same file more than once (something that I would prefer to notice rather than mask). So I used an assertion to ensure both the uniqueness of the names and correctness of the code that adds names. Also, with assertions disabled, this approach adds no additional performance overhead at all. But if you'd like to use the HashSet to collect the file names, let me know so I will redo the patches.
        Hide
        Shai Erera added a comment -

        On one hand, it's good to add the files to a Set, so that we can be sure they are added uniquely. On the other hand though, if we expect files are added properly, then adding to the set is redundant. Since this code is executed once per SI instance, I think explicitly adding to a Set is better.

        Note that while the assert you added will work, if someone runs without assertions he may get duplicate file names, if indeed they are added twice. I think that it's not so crucial to know that the same files was added twice, it's a very unlikely bug, but it is crucial that files() return unique names.

        So can you please use a Set in the method instead of the assert (like it's done on trunk). Also, while you're at it, the method doesn't have javadocs - they appear in regular comments. Can you convert them to javadocs (there is a warning there about not modifying the returned List, but it's not visible as javadocs .

        Show
        Shai Erera added a comment - On one hand, it's good to add the files to a Set, so that we can be sure they are added uniquely. On the other hand though, if we expect files are added properly, then adding to the set is redundant. Since this code is executed once per SI instance, I think explicitly adding to a Set is better. Note that while the assert you added will work, if someone runs without assertions he may get duplicate file names, if indeed they are added twice. I think that it's not so crucial to know that the same files was added twice, it's a very unlikely bug, but it is crucial that files() return unique names. So can you please use a Set in the method instead of the assert (like it's done on trunk). Also, while you're at it, the method doesn't have javadocs - they appear in regular comments. Can you convert them to javadocs (there is a warning there about not modifying the returned List, but it's not visible as javadocs .
        Hide
        Shai Erera added a comment -

        Patch against 3x - fixes the bug according to Alexander's other patch (but uses HashSet all the way), and I added a CHANGES entry and test case to TestSegmentInfo. I plan to commit this soon and also backport to 3.0 and 2.9

        Show
        Shai Erera added a comment - Patch against 3x - fixes the bug according to Alexander's other patch (but uses HashSet all the way), and I added a CHANGES entry and test case to TestSegmentInfo. I plan to commit this soon and also backport to 3.0 and 2.9
        Hide
        Michael McCandless added a comment -

        Patch looks good Shai!

        I don't think you need to backport to 2.9/3.0 immediately (unless you really want to!)? We can backport if/when we do another release...

        Show
        Michael McCandless added a comment - Patch looks good Shai! I don't think you need to backport to 2.9/3.0 immediately (unless you really want to!)? We can backport if/when we do another release...
        Hide
        Shai Erera added a comment -

        Committed revision 1060358 (3x).
        Committed revision 1060391 (3.0).
        Committed revision 1060398 (2.9).

        Thanks Alexander !

        Show
        Shai Erera added a comment - Committed revision 1060358 (3x). Committed revision 1060391 (3.0). Committed revision 1060398 (2.9). Thanks Alexander !
        Hide
        Alexander Kanarsky added a comment -

        Thanks Shai and Michael!

        Show
        Alexander Kanarsky added a comment - Thanks Shai and Michael!
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Alexander Kanarsky
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development