Derby
  1. Derby
  2. DERBY-4173

java/lang/OutOfMemoryError" "native memory exhausted" received on z/OS for process launched to shutdown network server and launch the replication run in Suites.All

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 10.5.1.1
    • Fix Version/s: None
    • Component/s: Network Server
    • Labels:
      None
    • Environment:
    • Bug behavior facts:
      Crash

      Description

      With pmz3160sr2ifix-20081021_01(SR2+IZ32776+IZ33456 there were 6 javacore files produced during the test run with jvm crashes shutting down network server.

      1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "native memory exhausted" received
      1TIDATETIME Date: 2009/04/16 at 23:04:20

      The command line in two javacore was starting the replication run.

      1CICMDLINE /u/vanlm/pmz3160sr2ifx-20081021_01/J6.0/bin/java -Dderby.tests.trace=true -Dtest.serverHost=localhost -Dtest.serverPort=1527 -Dtest.inserts=10000 -Dtest.commitFreq=1000 -classpath .:/u/vanlm/10.5/jars/derbyrun.jar:/u/vanlm/10.5/jars/junit.jar:/u/vanlm/10.5/jars/derbyTesting.jar:/u/vanlm/10.5/jars/jakarta-oro-2.0.8.jar junit.textui.TestRunner org.apache.derbyTesting.functionTests.tests.replicationTests.ReplicationTestRun

      I never knew the replication run was launched in another process like this.

      The others were just simple server shutdowns:
      1CICMDLINE /u/vanlm/pmz3160sr2ifx-20081021_01/J6.0/bin/java -Dderby.infolog.append=true -cp .:/u/vanlm/10.5/jars/derbyrun.jar:/u/vanlm/10.5/jars/junit.jar:/u/vanlm/10.5/jars/derbyTesting.jar:/u/vanlm/10.5/jars/jakarta-oro-2.0.8.jar org.apache.derby.drda.NetworkServerControl shutdown -h localhost -p 4527

      It is not clear to me how a simple shutdown could run out of native memory, since it only simply starts java and writes a few bytes to a socket to shutdown the server.

      Then 40 minutes later the system shutdown with complaints that it was out of AUX space
      09106 23:40:09.87 STC20757 00000010 ACTR001I VANLM9 STEP1 BPXPRFC 0001
      09106 23:40:11.79 STC20746 00000201 IEA794I SVC DUMP HAS CAPTURED: 379
      379 00000201 DUMPID=004 REQUESTED BY JOB (VANLM )
      379 00000201 DUMP TITLE=COMPON=ISG,COMPID=SCSDS,ISSUER=ISGRREC ,MODULE=ISGGR
      379 00000201 T +????,ABEND=S0738,REASON=023E0004
      09106 23:40:11.83 00000201 IEF196I IGD100I 2030 ALLOCATED TO DDNAME SYS00007 DATACLAS ( )
      09106 23:40:30.76 00000010 IRA205I 50% AUXILIARY STORAGE ALLOCATED
      09106 23:40:36.76 00000010 *IRA200E AUXILIARY STORAGE SHORTAGE

      It appears the Aux space issue occurred because there was not sufficient space to write the dumps from the prior events, but it is not clear why it didn't happen for 40 minutes.

      The crashes did not cause any tests to fail and the run completed reporting only:
      There was 1 failure:
      1) testMkdirsInvalidAbsolute(org.apache.derbyTesting.unitTests.junit.VirtualFileTest)junit.framework.AssertionFailedErro
      r
      at org.apache.derbyTesting.unitTests.junit.VirtualFileTest.testMkdirsInvalidAbsolute(VirtualFileTest.java:94)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:45)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
      at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:105)

      FAILURES!!!
      Tests run: 10939, Failures: 1, Errors: 0

      $
      I did not notice this happening with the 64bit jvm when I ran against 10.5.1.0 RC1 but there is a chance that it occurred and I didn't notice beaause it does not actually cause tests to fail for some reason.

      1. crashdumps.zip
        1.03 MB
        Kathey Marsden

        Activity

        Hide
        Kathey Marsden added a comment -

        This happened again and we were able to identify the problem as the child and parent process sharing the same address space. There is an environment variable on z/OS Unix which controls this. Before running the tests, you need to
        export _BPX_SHAREAS="NO"

        This can also be set in /etc/profile or $HOME/.profile

        Show
        Kathey Marsden added a comment - This happened again and we were able to identify the problem as the child and parent process sharing the same address space. There is an environment variable on z/OS Unix which controls this. Before running the tests, you need to export _BPX_SHAREAS="NO" This can also be set in /etc/profile or $HOME/.profile
        Hide
        Kathey Marsden added a comment -

        Well since the machine has been restarted, I have not been able to make this occur. The JVM developer that analyzed the dumps seems convinced that the JVM was doing the right thing and there must have been some unknown problem with the state of the machine, which remains a mystery.

        Closing Invalid.

        Show
        Kathey Marsden added a comment - Well since the machine has been restarted, I have not been able to make this occur. The JVM developer that analyzed the dumps seems convinced that the JVM was doing the right thing and there must have been some unknown problem with the state of the machine, which remains a mystery. Closing Invalid.
        Hide
        Kathey Marsden added a comment -

        After the machine was restarted (due to an air conditioning failure), this problem no longer reproduces. The system programmer says it may be a "storage creep" issue which could be either an application or OS problem.

        I will loop the tests for a while and ask the jvm team to look at the dumps. If that leads nowhere, I may need to close this Cannot Reproduce.

        Show
        Kathey Marsden added a comment - After the machine was restarted (due to an air conditioning failure), this problem no longer reproduces. The system programmer says it may be a "storage creep" issue which could be either an application or OS problem. I will loop the tests for a while and ask the jvm team to look at the dumps. If that leads nowhere, I may need to close this Cannot Reproduce.
        Hide
        Kathey Marsden added a comment -

        Here are the dumps from the test run.

        Show
        Kathey Marsden added a comment - Here are the dumps from the test run.

          People

          • Assignee:
            Kathey Marsden
            Reporter:
            Kathey Marsden
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development