Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Lets keep an issue where we report on g1 issues. Lets keep list of crashes we see.

      I filed an issue up on bug parade. Lets see if it becomes actual bug. Below I note version of vm and the type of crash (Internal Error (nmethod.cpp:1981), pid=32319..... Its a 'fatal error'). It happens for me after 5-10 minutes when a loading test. Same thing each time.

      G1 in 1.6 seems plain broke; crashes on use of stuff in concurrent utils package.

      Date Created: Thu Dec 10 13:33:04 MST 2009
      ..
      Synopsis: Running G1 GC, crashes with " Internal Error (nmethod.cpp:1981), pid=32319..."
      Description:
      FULL PRODUCT VERSION :
      java version "1.7.0-ea"
      Java(TM) SE Runtime Environment (build 1.7.0-ea-b77)
      Java HotSpot(TM) 64-Bit Server VM (build 17.0-b05, mixed mode)

      FULL OS VERSION :
      Fedora Core release 6 (Zod)

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      Here are the JVM args:

      -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          7u4 was the release where G1 is officially marked as stable. Sematext is running with G1 enabled: http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector. I've been running cluster tests with G1 and haven't observed issues. Resolving as 'Cannot Reproduce'. Reopen if G1 crashes are observed post 7u4.

          Show
          Andrew Purtell added a comment - 7u4 was the release where G1 is officially marked as stable. Sematext is running with G1 enabled: http://blog.sematext.com/2013/06/24/g1-cms-java-garbage-collector . I've been running cluster tests with G1 and haven't observed issues. Resolving as 'Cannot Reproduce'. Reopen if G1 crashes are observed post 7u4.
          Hide
          Liang Xie added a comment -

          en, it's not recommented to enable G1 before jdk7u4
          we need to reevaluate this issue with jdk7u4+ again,IMHO

          Show
          Liang Xie added a comment - en, it's not recommented to enable G1 before jdk7u4 we need to reevaluate this issue with jdk7u4+ again,IMHO
          Hide
          stack added a comment -
          Show
          stack added a comment - A post by Jonathan Payne on G1 not working for him: http://printfdebugger.tumblr.com/post/19142660766/how-i-learned-to-love-cms-and-had-my-heart-broken-by-g1
          Hide
          stack added a comment -
          Show
          stack added a comment - A post by Jonathan Payne on G1 not working for him: http://printfdebugger.tumblr.com/post/19142660766/how-i-learned-to-love-cms-and-had-my-heart-broken-by-g1
          Hide
          stack added a comment -

          Up on the list Friso says u21 stays up for him.

          Also, here is link to Todds thread w/ GC team trying to get to the bottom of stop-the-world pauses in G1 and in CMS: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-July/000653.html

          Show
          stack added a comment - Up on the list Friso says u21 stays up for him. Also, here is link to Todds thread w/ GC team trying to get to the bottom of stop-the-world pauses in G1 and in CMS: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-July/000653.html
          Hide
          stack added a comment -

          I ran more loadings overnight and it crashed again with above "Internal Error" after a while. Its better, but it crashes too much still.

          The thread cited above had more added to it overnight. A few interesting points made:

          + G1 uses way more CPU and its much spikier than CMS
          + Pauses are longer than those in CMS it seems

          Looks like it has a ways to go yet.

          The thread also talks of stuff being done in jdk7 not yet available to help with fragmentation running CMS (currently only available if support contract): http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6631166

          Show
          stack added a comment - I ran more loadings overnight and it crashed again with above "Internal Error" after a while. Its better, but it crashes too much still. The thread cited above had more added to it overnight. A few interesting points made: + G1 uses way more CPU and its much spikier than CMS + Pauses are longer than those in CMS it seems Looks like it has a ways to go yet. The thread also talks of stuff being done in jdk7 not yet available to help with fragmentation running CMS (currently only available if support contract): http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6631166
          Hide
          stack added a comment -

          I removed -XX:-ReduceInitialCardMarks from list of JVM args when running with G1 and got the below pretty quickly.

          #

          1. A fatal error has been detected by the Java Runtime Environment:
            #
          2. Internal Error (nmethod.cpp:2013), pid=31341, tid=1093945664
          3. Error: guarantee(cont_offset != 0,"unhandled implicit exception in compiled code")
            #
          4. JRE version: 7.0-b83
          5. Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b09 mixed mode linux-amd64 )
          6. An error report file with more information is saved as:
          7. /home/stack/0.20-head/hs_err_pid31341.log
            #
          8. If you would like to submit a bug report, please visit:
          9. http://java.sun.com/webapps/bugreport/crash.jsp
            #

          I reported it.

          Adding back in the option, I seem to get the pretty-good stability again.

          i got the above option from this email: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-February/000512.html

          Show
          stack added a comment - I removed -XX:-ReduceInitialCardMarks from list of JVM args when running with G1 and got the below pretty quickly. # A fatal error has been detected by the Java Runtime Environment: # Internal Error (nmethod.cpp:2013), pid=31341, tid=1093945664 Error: guarantee(cont_offset != 0,"unhandled implicit exception in compiled code") # JRE version: 7.0-b83 Java VM: Java HotSpot(TM) 64-Bit Server VM (17.0-b09 mixed mode linux-amd64 ) An error report file with more information is saved as: /home/stack/0.20-head/hs_err_pid31341.log # If you would like to submit a bug report, please visit: http://java.sun.com/webapps/bugreport/crash.jsp # I reported it. Adding back in the option, I seem to get the pretty-good stability again. i got the above option from this email: http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2010-February/000512.html
          Hide
          stack added a comment -

          6u18 running CMS crashes on me though adding -XX:-ReduceInitialCardMarks might help if I read a recent note up on the gc list properly. Its the datanode that crashes on me.

          1.7.0b83 is best I've run so far. It ran the longest. It OOME'd and then SIGSEGV'd dumping the heap. I reported it. Still not there it seems.

          Show
          stack added a comment - 6u18 running CMS crashes on me though adding -XX:-ReduceInitialCardMarks might help if I read a recent note up on the gc list properly. Its the datanode that crashes on me. 1.7.0b83 is best I've run so far. It ran the longest. It OOME'd and then SIGSEGV'd dumping the heap. I reported it. Still not there it seems.
          Hide
          Andrew Purtell added a comment -

          6u18 is out with a number of G1 fixes listed in the release notes. Might be worth trying out.

          Show
          Andrew Purtell added a comment - 6u18 is out with a number of G1 fixes listed in the release notes. Might be worth trying out.
          Hide
          stack added a comment -

          The G1 in 1.6 fails for me in concurrent utils package with a regularity, something it doesn't do in 1.7. I get further when I use jdk7. Maybe try it? I did a search on 'IGSEGV (0xb) at pc=0x00007f05f888f201, pid=19186' It turned up a few hits. Were you running on xen? There is also this: http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=109

          Show
          stack added a comment - The G1 in 1.6 fails for me in concurrent utils package with a regularity, something it doesn't do in 1.7. I get further when I use jdk7. Maybe try it? I did a search on 'IGSEGV (0xb) at pc=0x00007f05f888f201, pid=19186' It turned up a few hits. Were you running on xen? There is also this: http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=109
          Hide
          Andrew Purtell added a comment -

          I thought I would try G1 out again tonight. Simple all-localhost config, using file:/// rootdir on /tmp, running PE randomWrite 1:

          #  SIGSEGV (0xb) at pc=0x00007f05f888f201, pid=19186, tid=139663553251600
          #
          # JRE version: 6.0_16-b01
          # Java VM: Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode linux-amd64 )
          # Problematic frame:
          # V  [libjvm.so+0x31d201]
          
          Current thread (0x00000000419b6000):  GCTaskThread 
              [stack: 0x0000000000000000,0x0000000000000000] [id=19199]
          siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000
          
          Registers:
          RAX=0x0000000000000001, RBX=0x00007f05d3e52b00, RCX=0x00007f05f8e26d70, 
          RDX=0x4141414158585858, RSP=0x00007f05f47a6fc0, RBP=0x00007f05f47a7010, 
          RSI=0x00007f05d3e52b00, RDI=0x4141414158585868,R8 =0x00007f05ef100000,
          R9 =0x0000000007d00000, R10=0x00000000419da520, R11=0x0000000000000235,
          R12=0x00007f05f47ab5c0, R13=0x00007f05d3e52b00, R14=0x00007f05f47ab5c0,
          R15=0x00000000419b9830, RIP=0x00007f05f888f201, EFL=0x0000000000010246,
          CSGSFS=0x0000000000000033, ERR=0x0000000000000000
          TRAPNO=0x000000000000000d
          
          Show
          Andrew Purtell added a comment - I thought I would try G1 out again tonight. Simple all-localhost config, using file:/// rootdir on /tmp, running PE randomWrite 1: # SIGSEGV (0xb) at pc=0x00007f05f888f201, pid=19186, tid=139663553251600 # # JRE version: 6.0_16-b01 # Java VM: Java HotSpot(TM) 64-Bit Server VM (14.2-b01 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x31d201] Current thread (0x00000000419b6000): GCTaskThread [stack: 0x0000000000000000,0x0000000000000000] [id=19199] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x0000000000000000 Registers: RAX=0x0000000000000001, RBX=0x00007f05d3e52b00, RCX=0x00007f05f8e26d70, RDX=0x4141414158585858, RSP=0x00007f05f47a6fc0, RBP=0x00007f05f47a7010, RSI=0x00007f05d3e52b00, RDI=0x4141414158585868,R8 =0x00007f05ef100000, R9 =0x0000000007d00000, R10=0x00000000419da520, R11=0x0000000000000235, R12=0x00007f05f47ab5c0, R13=0x00007f05d3e52b00, R14=0x00007f05f47ab5c0, R15=0x00000000419b9830, RIP=0x00007f05f888f201, EFL=0x0000000000010246, CSGSFS=0x0000000000000033, ERR=0x0000000000000000 TRAPNO=0x000000000000000d
          Hide
          ryan rawson added a comment -

          I've seen this before... the solution is to set Xms=Xmx. Not ideal but a workaround for now.

          Show
          ryan rawson added a comment - I've seen this before... the solution is to set Xms=Xmx. Not ideal but a workaround for now.
          Hide
          stack added a comment -

          The way we get heap size doesn't seem to work when G1 in operation:

          2009-12-10 22:50:50,216 [IPC Server handler 8 on 60020] INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of TestTable,0012079792,1260483094491 because global memstore limit of 3.2m exceeded; currently 
          19.1m and flushing till 2.0m
          

          Heap in above is actually 3.4G

          Show
          stack added a comment - The way we get heap size doesn't seem to work when G1 in operation: 2009-12-10 22:50:50,216 [IPC Server handler 8 on 60020] INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Forced flushing of TestTable,0012079792,1260483094491 because global memstore limit of 3.2m exceeded; currently 19.1m and flushing till 2.0m Heap in above is actually 3.4G

            People

            • Assignee:
              Unassigned
              Reporter:
              stack
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development