Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6098

Indexwriter changecount assertion fail with g1gc

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This failed on 1.8.0u25 G1GC and I was able to reproduce at least once, but not without G1.

      svn co -r 1643097 https://svn.eu.apache.org/repos/asf/lucene/dev/trunk
      cd trunk/lucene/core
      ant beast  -Dtestcase=TestStressDeletes -Dtests.method=test -Dtests.seed=C8F513C39231BFA2 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Singapore -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 -Dbeast.iters=1000 -Dargs="-XX:+UseG1GC" -Dtests.dups=8
      

      I am not sure this is possible to debug, but the exception is scary:

         [junit4] Suite: org.apache.lucene.index.TestStressDeletes
         [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestStressDeletes -Dtests.method=test -Dtests.seed=C8F513C39231BFA2 -Dtests.slow=true -Dtests.locale=sk -Dtests.timezone=Singapore -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
         [junit4] ERROR   0.62s J5 | TestStressDeletes.test <<<
         [junit4]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=68, name=Thread-30, state=RUNNABLE, group=TGRP-TestStressDeletes]
         [junit4]    > 	at __randomizedtesting.SeedInfo.seed([C8F513C39231BFA2:40A12C193CCDD25A]:0)
         [junit4]    > Caused by: java.lang.AssertionError: lastCommitChangeCount=130 changeCount=128
         [junit4]    > 	at __randomizedtesting.SeedInfo.seed([C8F513C39231BFA2]:0)
         [junit4]    > 	at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4256)
         [junit4]    > 	at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2778)
         [junit4]    > 	at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2881)
         [junit4]    > 	at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2848)
         [junit4]    > 	at org.apache.lucene.index.TestStressDeletes$1.run(TestStressDeletes.java:75)
         [junit4]   2> NOTE: test params are: codec=Asserting(Lucene50): {id=PostingsFormat(name=Direct)}, docValues:{}, sim=RandomSimilarityProvider(queryNorm=false,coord=no): {id=DFR I(ne)B1}, locale=sk, timezone=Singapore
         [junit4]   2> NOTE: Linux 3.13.0-35-generic amd64/Oracle Corporation 1.8.0_25 (64-bit)/cpus=8,threads=1,free=428019792,total=499646464
         [junit4]   2> NOTE: All tests run in this JVM: [TestOmitNorms, Test2BPositions, Nested2, Nested3, Nested1, Test2BPagedBytes, TestTermsEnum2, TestDoc, TestTwoPhaseCommitTool, TestTimSorter, TestDocTermOrdsRangeFilter, TestWorstCaseTestBehavior, TestBasics, TestToken, TestStressDeletes]
         [junit4] Completed on J5 in 0.70s, 1 test, 1 error <<< FAILURES!
      
      
      
      1. LUCENE-6098.patch
        0.8 kB
        Robert Muir

        Activity

        Hide
        rcmuir Robert Muir added a comment -

        I would like to add some defense and change the assert to a real check, like this patch.

        Show
        rcmuir Robert Muir added a comment - I would like to add some defense and change the assert to a real check, like this patch.
        Hide
        mikemccand Michael McCandless added a comment -

        +1

        Show
        mikemccand Michael McCandless added a comment - +1
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1643591 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1643591 ]

        LUCENE-6098: change to a hard check

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1643591 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1643591 ] LUCENE-6098 : change to a hard check
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1643592 from Robert Muir in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1643592 ]

        LUCENE-6098: change to a hard check

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1643592 from Robert Muir in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1643592 ] LUCENE-6098 : change to a hard check
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 1643593 from Robert Muir in branch 'dev/branches/lucene_solr_4_10'
        [ https://svn.apache.org/r1643593 ]

        LUCENE-6098: change to a hard check

        Show
        jira-bot ASF subversion and git services added a comment - Commit 1643593 from Robert Muir in branch 'dev/branches/lucene_solr_4_10' [ https://svn.apache.org/r1643593 ] LUCENE-6098 : change to a hard check
        Hide
        jhump Joshua Humphries added a comment -

        This ticket hasn't been touched in over 2 years. Any updates? I saw that Oracle has fixed https://bugs.openjdk.java.net/browse/JDK-8038348, which seems to be a primary cause of corruption with G1. I also noticed that https://issues.apache.org/jira/browse/LUCENE-5168, which may be related, is now resolved.

        Do Lucene tests pass yet with G1? We have a cluster where most of our machines have pretty large heaps and are hoping G1 could help with the 99th percentile STW pause times. But are obviously scared to try G1 if there are legit concerns of index corruption.

        Show
        jhump Joshua Humphries added a comment - This ticket hasn't been touched in over 2 years. Any updates? I saw that Oracle has fixed https://bugs.openjdk.java.net/browse/JDK-8038348 , which seems to be a primary cause of corruption with G1. I also noticed that https://issues.apache.org/jira/browse/LUCENE-5168 , which may be related, is now resolved. Do Lucene tests pass yet with G1? We have a cluster where most of our machines have pretty large heaps and are hoping G1 could help with the 99th percentile STW pause times. But are obviously scared to try G1 if there are legit concerns of index corruption.

          People

          • Assignee:
            Unassigned
            Reporter:
            rcmuir Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development