Harmony
  1. Harmony
  2. HARMONY-3117

[db2] IBM DB2 JDBC "sample apps" crash on exit

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: 5.0M4
    • Labels:
      None
    • Environment:
      EM64T -- RedHat Enterprise Linux 4 - U4
      IBM DB2 Express-C version9.1
      Latest Harmony JRE binary download (vn = r487452, (Dec 15 2006), Linux/em64t/gcc 4.0.3, release build)
    • Estimated Complexity:
      Unknown

      Description

      Putting critical because critical is defined as "Crashes, loss of data, severe memory leak."

      I was experimenting with whether DB2 JDBC connection will work with Harmony. I am using the sample apps that come with DB2. The JDBC layer appears to connect to the database successfully (which is good for Harmony) and queries appear to work (data comes thru). However, during shutdown of the sample apps, the process regularly segfaults when using Harmony and exits cleanly using the BEA JRE and Sun JRE.

      crash behavior is consistent with both "java DbConn" (basic connection test) and "java TbSel" (basic sql select test) sample apps that come with the "free" version of DB2.

      unfortunately, the core file provides little insight.
      (gdb) bt
      #0 0x0000002aaf5898fa in ?? ()
      #1 0x0000000000000000 in ?? ()
      (gdb) info threads

      • 1 process 22262 0x0000002aaf5898fa in ?? ()

      Attaching with debugger gives a possible hint:
      Program received signal SIGSEGV, Segmentation fault.
      0x0000002aaf5898fa in OSSHLibrary::unload ()
      from /home/db2inst/sqllib/lib64/libdb2osse.so.1
      (gdb) bt
      #0 0x0000002aaf5898fa in OSSHLibrary::unload ()
      from /home/db2inst/sqllib/lib64/libdb2osse.so.1
      #1 0x0000002aacce93de in sqlexPluginUnload ()
      from /home/db2inst/sqllib/lib64/libdb2.so.1
      #2 0x0000002aad1dd080 in sqlexAppLibTerm ()
      from /home/db2inst/sqllib/lib64/libdb2.so.1
      #3 0x0000002aacc41afa in sqlmStreamFlagsAction ()
      from /home/db2inst/sqllib/lib64/libdb2.so.1
      #4 0x0000002aacc41b83 in _ZN10appLibInitD9Ev ()
      from /home/db2inst/sqllib/lib64/libdb2.so.1
      #5 0x0000002aacc41b73 in appLibInit::~appLibInit ()
      from /home/db2inst/sqllib/lib64/libdb2.so.1
      #6 0x000000380df30c45 in exit () from /lib64/tls/libc.so.6
      #7 0x000000380df1c402 in __libc_start_main () from /lib64/tls/libc.so.6
      #8 0x000000000040096a in _start () at ../sysdeps/x86_64/elf/start.S:113

      It looks to me that the C++ destructors registered by some presumably JNI components are being invoked by the C runtime at process exit. At this time there are no other threads remaining (i.e., java looks like it is done and gone) and presumably during the cleanup process something gets out of control.

      In contrast with the Sun Java5 JRE, there are 13 other threads remaining when the destructor runs and 12 other threads with the BEA Java5 JRE.

      I'm not sure if this is a compatibility issue with the reference implementation or if is simply a hole in the JNI support that Harmony currently provides. It appears to be 100% reproducable.

      1. db2-setup.zip
        131 kB
        Chris Elford
      2. db2admin.png
        19 kB
        Gregory Shimansky

        Issue Links

          Activity

          Hide
          Gregory Shimansky added a comment -

          Since DRLVM is the only VM which works on x86_64 I assume that you were using it. So I changed bug category to DRLVM.

          Show
          Gregory Shimansky added a comment - Since DRLVM is the only VM which works on x86_64 I assume that you were using it. So I changed bug category to DRLVM.
          Hide
          Chris Elford added a comment -

          Yes, I was definately running DRLVM. . I would agree that the problem is most likely in the VM but wouldn't rule out a core class library allowing the VM to exit before it should have.

          Show
          Chris Elford added a comment - Yes, I was definately running DRLVM. . I would agree that the problem is most likely in the VM but wouldn't rule out a core class library allowing the VM to exit before it should have.
          Hide
          Tim Ellison added a comment -

          Chris, can you try to reproduce the failure using the IBM VME with Harmony classlib? IT would help us narrow down the cause of failure.

          Show
          Tim Ellison added a comment - Chris, can you try to reproduce the failure using the IBM VME with Harmony classlib? IT would help us narrow down the cause of failure.
          Hide
          Chris Elford added a comment -

          I ran into a few problems trying this.

          (1) Downloaded IBM VME from http://www-128.ibm.com/developerworks/java/jdk/harmony/index.html
          (2) Downloaded the latest binary x86 version of the JRE
          (3) replaced harmony-jre-487452/bin/default with contents of the VME
          ran hello world. It segfaults during vm exiting (looks like a recursion issue in hythread). Replace default with the DRLVM one and hello world works again.

          I thought it might be an issue with legacy App on EM64T OS so I booted to fedora core 6 (x86 version) and it showed the same behavior.

          I thought maybe it would crash with the DB2 test before the hythread failure so I tho

          Show
          Chris Elford added a comment - I ran into a few problems trying this. (1) Downloaded IBM VME from http://www-128.ibm.com/developerworks/java/jdk/harmony/index.html (2) Downloaded the latest binary x86 version of the JRE (3) replaced harmony-jre-487452/bin/default with contents of the VME ran hello world. It segfaults during vm exiting (looks like a recursion issue in hythread). Replace default with the DRLVM one and hello world works again. I thought it might be an issue with legacy App on EM64T OS so I booted to fedora core 6 (x86 version) and it showed the same behavior. I thought maybe it would crash with the DB2 test before the hythread failure so I tho
          Hide
          Chris Elford added a comment -

          I ran into a few problems trying this.

          (1) Downloaded IBM VME from http://www-128.ibm.com/developerworks/java/jdk/harmony/index.html
          (2) Downloaded the latest binary x86 version of the JRE
          (3) replaced harmony-jre-487452/bin/default with contents of the VME
          ran hello world. It segfaults during vm exiting (looks like a recursion issue in hythread). Replace default with the DRLVM one and hello world works again.

          I thought it might be an issue with legacy App on EM64T OS so I booted to fedora core 6 (x86 version) and it showed the same behavior.

          I thought maybe it would crash with the DB2 test before the hythread failure so I tho

          Show
          Chris Elford added a comment - I ran into a few problems trying this. (1) Downloaded IBM VME from http://www-128.ibm.com/developerworks/java/jdk/harmony/index.html (2) Downloaded the latest binary x86 version of the JRE (3) replaced harmony-jre-487452/bin/default with contents of the VME ran hello world. It segfaults during vm exiting (looks like a recursion issue in hythread). Replace default with the DRLVM one and hello world works again. I thought it might be an issue with legacy App on EM64T OS so I booted to fedora core 6 (x86 version) and it showed the same behavior. I thought maybe it would crash with the DB2 test before the hythread failure so I tho
          Hide
          Chris Elford added a comment -

          continuing the comment above after the browser went weird on me...

          The segfault in hello world for me was:
          0 0xf7bfc035 in hythread_tls_get () from /opt/x86-hre5-ibm/bin/libhythr.so
          #1 0xf6f840da in jsig_handler () from /opt/x86-hre5-ibm/bin/default/libjsig.so
          #2 0xf6fa14d6 in masterSynchSignalHandler ()
          from /opt/x86-hre5-ibm/bin/default/libj9prt23.so
          #3 <signal handler called>
          #4 0xf7bfc035 in hythread_tls_get () from /opt/x86-hre5-ibm/bin/libhythr.so
          #5 0xf6f840da in jsig_handler () from /opt/x86-hre5-ibm/bin/default/libjsig.so
          #6 0xf6fa14d6 in masterSynchSignalHandler ()
          from /opt/x86-hre5-ibm/bin/default/libj9prt23.so

          I'm not sure if I need another libhythr.so (older) that matches the VME or not.

          Any any event, I can't run the DB2 test because of the JNI part of JDBC in DB2. x86 jvm can only load x86 JNI libraries and since I'm runnig x86_64 version of DB2 those libraries aren't there.

          Show
          Chris Elford added a comment - continuing the comment above after the browser went weird on me... The segfault in hello world for me was: 0 0xf7bfc035 in hythread_tls_get () from /opt/x86-hre5-ibm/bin/libhythr.so #1 0xf6f840da in jsig_handler () from /opt/x86-hre5-ibm/bin/default/libjsig.so #2 0xf6fa14d6 in masterSynchSignalHandler () from /opt/x86-hre5-ibm/bin/default/libj9prt23.so #3 <signal handler called> #4 0xf7bfc035 in hythread_tls_get () from /opt/x86-hre5-ibm/bin/libhythr.so #5 0xf6f840da in jsig_handler () from /opt/x86-hre5-ibm/bin/default/libjsig.so #6 0xf6fa14d6 in masterSynchSignalHandler () from /opt/x86-hre5-ibm/bin/default/libj9prt23.so I'm not sure if I need another libhythr.so (older) that matches the VME or not. Any any event, I can't run the DB2 test because of the JNI part of JDBC in DB2. x86 jvm can only load x86 JNI libraries and since I'm runnig x86_64 version of DB2 those libraries aren't there.
          Hide
          Chris Elford added a comment -

          Gregory,

          You mentioned to me that you were hoping for a quick path to repro...

          If it will help, I can provide you temporary vnc access to a box that has DB2 express installed and working to a point that the failure can be exhibited.

          The basic steps I took are:

          • Started with an EL4 (EM64T) system
          • downloaded http://www.ibm.com/developerworks/downloads/im/udbexp/?S_TACT=105AGX01&S_CMP=HP
          • ran db2setup as root and installed all components pretty much choosing defaults
          • logged in as db2inst1
          • ran db2cc and made sure the instance was started
          • ran db2fs and set up sample database
          • copied /opt/ibm/db2/V9.1/samples/java/jdbc directory to ~/samples/jdbc
          • set PATH to $PATH:/opt/ibm/db2/V9.1/java/jdk64/bin:.
          • make DbInfo
          • java DbInfo – successfully runs with IBM VM
          • from another window with path to Harmony instead "java DbInfo" segfaults on exit

          Thx,

          Chris

          Show
          Chris Elford added a comment - Gregory, You mentioned to me that you were hoping for a quick path to repro... If it will help, I can provide you temporary vnc access to a box that has DB2 express installed and working to a point that the failure can be exhibited. The basic steps I took are: Started with an EL4 (EM64T) system downloaded http://www.ibm.com/developerworks/downloads/im/udbexp/?S_TACT=105AGX01&S_CMP=HP ran db2setup as root and installed all components pretty much choosing defaults logged in as db2inst1 ran db2cc and made sure the instance was started ran db2fs and set up sample database copied /opt/ibm/db2/V9.1/samples/java/jdbc directory to ~/samples/jdbc set PATH to $PATH:/opt/ibm/db2/V9.1/java/jdk64/bin:. make DbInfo java DbInfo – successfully runs with IBM VM from another window with path to Harmony instead "java DbInfo" segfaults on exit Thx, Chris
          Hide
          Gregory Shimansky added a comment -

          I am a newbie in this DB stuff so I need some more assistance. I installed DB2 and when I run db2cc and try to start an instance I get an error message:

          ----------------------------------------------------------
          The admin node "MSTMRTD1" does not exist in the DB2 node
          directory.

          Explanation:

          The admin node "<node-name>" is invalid. The node name does not
          exist in the DB2 node directory.

          User Response:

          Verify that the node name "<node-name>" is cataloged in the
          admin node directory using the LIST ADMIN NODE DIRECTORY command.
          If the admin node is not listed in the admin node directory,
          submit a CATALOG ADMIN ... NODE command to catalog the admin
          node. If you continue to receive this error message after
          attempting the suggested response, please contact IBM Support.
          ----------------------------------------------------------

          Could it be caused by the shortened name of the host? The host name is mstmrtd106 while DB2 mentions MSTMRTD1...

          Show
          Gregory Shimansky added a comment - I am a newbie in this DB stuff so I need some more assistance. I installed DB2 and when I run db2cc and try to start an instance I get an error message: ---------------------------------------------------------- The admin node "MSTMRTD1" does not exist in the DB2 node directory. Explanation: The admin node "<node-name>" is invalid. The node name does not exist in the DB2 node directory. User Response: Verify that the node name "<node-name>" is cataloged in the admin node directory using the LIST ADMIN NODE DIRECTORY command. If the admin node is not listed in the admin node directory, submit a CATALOG ADMIN ... NODE command to catalog the admin node. If you continue to receive this error message after attempting the suggested response, please contact IBM Support. ---------------------------------------------------------- Could it be caused by the shortened name of the host? The host name is mstmrtd106 while DB2 mentions MSTMRTD1...
          Hide
          Chris Elford added a comment -

          I've not done too much w/ DB2 administration (the defaults worked pretty well for me) but...

          (1) When you installed db2 did you choose the default option to "create the instance now"?
          (2) After you installed as root, did you log out and then log back in as the db2 user (the default is to create a new user called db2inst1)?

          During install, the default is to create 3 users, a DAS user, an instance user, and a fenced user. After installing as root, you should log in as the instance user to do the remaining steps.

          I did uninstall and reinstalled this morning and took a few screen shots.
          step1 is the first run of db2cc (default/advanced mode) as db2inst1. It shows the instance and I can successfully right click "start". Note that I did a right click "start admin" first but I don't think you have to. At some point I got a message that the instance had already been started.

          step2 is the run of db2fs (again as db2inst1 user) showing where I clicked to create the sample database.

          step3 is where I went back to the db2cc window and did a refresh on the instance to see the database that had been created.

          There are cmd lines for starting up/managing the database as well as the gui but I don't know enough about db2 to really tell you the set of cmd line calls.

          How does your screen differ from these screenshots?

          thx,

          chris

          Show
          Chris Elford added a comment - I've not done too much w/ DB2 administration (the defaults worked pretty well for me) but... (1) When you installed db2 did you choose the default option to "create the instance now"? (2) After you installed as root, did you log out and then log back in as the db2 user (the default is to create a new user called db2inst1)? During install, the default is to create 3 users, a DAS user, an instance user, and a fenced user. After installing as root, you should log in as the instance user to do the remaining steps. I did uninstall and reinstalled this morning and took a few screen shots. step1 is the first run of db2cc (default/advanced mode) as db2inst1. It shows the instance and I can successfully right click "start". Note that I did a right click "start admin" first but I don't think you have to. At some point I got a message that the instance had already been started. step2 is the run of db2fs (again as db2inst1 user) showing where I clicked to create the sample database. step3 is where I went back to the db2cc window and did a refresh on the instance to see the database that had been created. There are cmd lines for starting up/managing the database as well as the gui but I don't know enough about db2 to really tell you the set of cmd line calls. How does your screen differ from these screenshots? thx, chris
          Hide
          Chris Elford added a comment -

          three screenshots

          Show
          Chris Elford added a comment - three screenshots
          Hide
          Gregory Shimansky added a comment -

          When I installed DB2 in the work environment it had quite a lot of troubles with NIS. Creating new users and their recognition by DB2 was a pain. So it probably the cause of failures. I'll try at home on my bleeding edge Gentoo x86_64 Linux installation.

          Show
          Gregory Shimansky added a comment - When I installed DB2 in the work environment it had quite a lot of troubles with NIS. Creating new users and their recognition by DB2 was a pain. So it probably the cause of failures. I'll try at home on my bleeding edge Gentoo x86_64 Linux installation.
          Hide
          Gregory Shimansky added a comment -

          Hmm I couldn't reproduce the crash both with debug and release builds of DRLVM. I've ran DbInfo, DbConn and TbSel samples and they finished successfully with exit code 0. It could be that my build differs (most packages are very fresh on Gentoo, for example gcc is 4.1.2, glibc 2.5 and kernel 2.6.20) from the binary snapshots that you were using. Could you give me an exact URL to the last snapshot that you've unpacked?

          I'll also try DB2 on SLES9 at work but there I'll have to figure out how to set up DB2 correctly so that it recognizes locally created users in NIS environment.

          Show
          Gregory Shimansky added a comment - Hmm I couldn't reproduce the crash both with debug and release builds of DRLVM. I've ran DbInfo, DbConn and TbSel samples and they finished successfully with exit code 0. It could be that my build differs (most packages are very fresh on Gentoo, for example gcc is 4.1.2, glibc 2.5 and kernel 2.6.20) from the binary snapshots that you were using. Could you give me an exact URL to the last snapshot that you've unpacked? I'll also try DB2 on SLES9 at work but there I'll have to figure out how to set up DB2 correctly so that it recognizes locally created users in NIS environment.
          Hide
          Chris Elford added a comment -

          It wouldn't surprise me too much if there were glibc to glibc variations here. This is after all a failure related to signal handling during the running of destructors during the libc exit() handler.

          harmony-jre-r542118 is the build I was using most recently.

          The link is no longer live on the snapshots page since it doesn't maintain a history.

          Chris

          Show
          Chris Elford added a comment - It wouldn't surprise me too much if there were glibc to glibc variations here. This is after all a failure related to signal handling during the running of destructors during the libc exit() handler. harmony-jre-r542118 is the build I was using most recently. The link is no longer live on the snapshots page since it doesn't maintain a history. Chris
          Hide
          Gregory Shimansky added a comment -

          I got the same result on SLES9 x86_64. Sample database was created successfully and TbSel, DbInfo and DbConn samples worked successfully.

          Could you please give more specific about sample database? Did you create it with XML support or not? I used the default mode with just SQL.

          Show
          Gregory Shimansky added a comment - I got the same result on SLES9 x86_64. Sample database was created successfully and TbSel, DbInfo and DbConn samples worked successfully. Could you please give more specific about sample database? Did you create it with XML support or not? I used the default mode with just SQL.
          Hide
          Chris Elford added a comment -

          no XML support. Just a plain, default sample database.

          Note I have tested with latest snapshot 545255 and see the same behavior.

          /usr/lib64/libstdc++.so.5 = version 5.0.7
          libc.so is version 2.3.4
          (This is a normal RHEL 4 update 4 install – 2.6.9-42)

          Note that the segfault does happen as the process exits (i.e., it does the work of the test then gives a segfault instead of a clean exit).

          I'll send you instructions on how to log into a RHEL4 system.

          On a semi-unrelated note, What type of linux system are snapshots created on? A RHEL4 system or a SuSe9 system? Its looking like a SuSe system of some sort. to me.

          Thx,

          Chris

          Show
          Chris Elford added a comment - no XML support. Just a plain, default sample database. Note I have tested with latest snapshot 545255 and see the same behavior. /usr/lib64/libstdc++.so.5 = version 5.0.7 libc.so is version 2.3.4 (This is a normal RHEL 4 update 4 install – 2.6.9-42) Note that the segfault does happen as the process exits (i.e., it does the work of the test then gives a segfault instead of a clean exit). I'll send you instructions on how to log into a RHEL4 system. On a semi-unrelated note, What type of linux system are snapshots created on? A RHEL4 system or a SuSe9 system? Its looking like a SuSe system of some sort. to me. Thx, Chris
          Hide
          Gregory Shimansky added a comment -

          The bug still requires evaluation to find out which component is to be blamed.

          Show
          Gregory Shimansky added a comment - The bug still requires evaluation to find out which component is to be blamed.
          Hide
          Gregory Shimansky added a comment -

          I think I know the reason for this crash. It is likely to be fixed now.

          The reason is the same as in HARMONY-5019 and HARMONY-3581 bugs. It happens when some library is unloaded on shutdown (from the stack trace it suggests that something is being unloaded), and some thread is interrupted, or throws a C++ exception.

          In this case the code that unwinds the stack using C++ rules of gcc traverses all of the thread stack, even if it finds a handler for C++ exception (thread interruption on Linux x86_64 is done in the same way as C++ exception), and when it reaches the code of unloaded library, it crashes because memory is no longer mapped for it.

          For HARMONY-5019 it was necessary to fix JVMTI agent shutdown, for HARMONY-3581 it was necessary to fix shutdown of all currently running threads. So now I am quite sure that this bug has to be fixed.

          If you have a chance to check DB2 on RHEL4, it would be great to check my guess.

          Show
          Gregory Shimansky added a comment - I think I know the reason for this crash. It is likely to be fixed now. The reason is the same as in HARMONY-5019 and HARMONY-3581 bugs. It happens when some library is unloaded on shutdown (from the stack trace it suggests that something is being unloaded), and some thread is interrupted, or throws a C++ exception. In this case the code that unwinds the stack using C++ rules of gcc traverses all of the thread stack, even if it finds a handler for C++ exception (thread interruption on Linux x86_64 is done in the same way as C++ exception), and when it reaches the code of unloaded library, it crashes because memory is no longer mapped for it. For HARMONY-5019 it was necessary to fix JVMTI agent shutdown, for HARMONY-3581 it was necessary to fix shutdown of all currently running threads. So now I am quite sure that this bug has to be fixed. If you have a chance to check DB2 on RHEL4, it would be great to check my guess.
          Hide
          Chris Elford added a comment -

          I'll see if I can still boot that box and try it out.... That boot partition may still exist

          Which binary should I use? M4? it looks like 5019 was fixed before M4(r603534)
          Thx,
          Chris

          Show
          Chris Elford added a comment - I'll see if I can still boot that box and try it out.... That boot partition may still exist Which binary should I use? M4? it looks like 5019 was fixed before M4(r603534) Thx, Chris
          Hide
          Gregory Shimansky added a comment -

          Yes, actually the main fix should exist in HARMONY-3581 since HARMONY-5019 is a fix just for JVMTI agent. It was fixed in M4, so you can try this release.

          Show
          Gregory Shimansky added a comment - Yes, actually the main fix should exist in HARMONY-3581 since HARMONY-5019 is a fix just for JVMTI agent. It was fixed in M4, so you can try this release.
          Hide
          Chris Elford added a comment -

          sigh...

          The boot disk where I had this installed is long gone now. I took my FC6-x86 box and installed the latest DB2 on it (9.5)...

          Differences from before:
          x86 instead of EM64T
          FC6 instead of FC4
          DB2 9.5 instead of DB2 9.1

          good news is that is looks like DB2 has gotten rid of this bug (going back all the way to M2 does not exhibit the exit condition anymore)... Unfortunately I'm not going to be able to repro. So we may as well close this one out on the assumption that it is indeed fixed (sounds reasonable).

          bad news is that it looks like the same demo cannot run completely because of a missing Character converter error:
          [jcc][t4][10199][10462][3.50.152] Required character converter is not available. ERRORCODE=-4220, SQLSTATE=null

          It works with IBM JVM, BEA JVM and fails with Harmony and gcj

          This is clearly a different bug.... Presumably something classlib related.

          Show
          Chris Elford added a comment - sigh... The boot disk where I had this installed is long gone now. I took my FC6-x86 box and installed the latest DB2 on it (9.5)... Differences from before: x86 instead of EM64T FC6 instead of FC4 DB2 9.5 instead of DB2 9.1 good news is that is looks like DB2 has gotten rid of this bug (going back all the way to M2 does not exhibit the exit condition anymore)... Unfortunately I'm not going to be able to repro. So we may as well close this one out on the assumption that it is indeed fixed (sounds reasonable). bad news is that it looks like the same demo cannot run completely because of a missing Character converter error: [jcc] [t4] [10199] [10462] [3.50.152] Required character converter is not available. ERRORCODE=-4220, SQLSTATE=null It works with IBM JVM, BEA JVM and fails with Harmony and gcj This is clearly a different bug.... Presumably something classlib related.
          Hide
          Gregory Shimansky added a comment -

          Well, if I understood the bug cause, it had to be Linux x86_64 specific, so testing on x86 doesn't help. It is on x86_64 where stack unwinding at thread interruption or C++ exception is done in some way that causes gcc code in libunwind to traverse all the stack of the thread, including possibly unmapped code. On x86 this doesn't happen, probably because not all of the stack is traversed to the bottom, or maybe stack unwinding at thread interruption point (this usually happens at VM shutdown when it kills running threads) doesn't use C++ unwinding at all. This is the reason why problems with unclean shutdown happened only on Linux x86_64.

          Also, since I tried to reproduce this bug on SLES10 x86_64 and Gentoo x86_64, the bug seems to be RHEL specific. I don't mean that they behave differently, just it is probably that on RHEL the bug happened more often due to unknown reasons, like different pthreads version probably. I am talking in guesses because libunwind code is GPLed, and I read as minimum of it as it was necessary to understand the cause of this type of crash in HARMONY-5019.

          As far as I know, this library unmapping problem has to be fixed, so I am closing this bug. If something like this happens again, I'll make a closer inspection, now that I have the knowledge of the cause of such crashes.

          As for character conversion, I don't know for sure. Recently class library migrated from ICU v4 to ICU v6 and there were many changes in this area. I wonder if DB2 has some logs of Java exceptions so that it would be possible to understand what Java API it is using. In any case a bug report would be welcome.

          Show
          Gregory Shimansky added a comment - Well, if I understood the bug cause, it had to be Linux x86_64 specific, so testing on x86 doesn't help. It is on x86_64 where stack unwinding at thread interruption or C++ exception is done in some way that causes gcc code in libunwind to traverse all the stack of the thread, including possibly unmapped code. On x86 this doesn't happen, probably because not all of the stack is traversed to the bottom, or maybe stack unwinding at thread interruption point (this usually happens at VM shutdown when it kills running threads) doesn't use C++ unwinding at all. This is the reason why problems with unclean shutdown happened only on Linux x86_64. Also, since I tried to reproduce this bug on SLES10 x86_64 and Gentoo x86_64, the bug seems to be RHEL specific. I don't mean that they behave differently, just it is probably that on RHEL the bug happened more often due to unknown reasons, like different pthreads version probably. I am talking in guesses because libunwind code is GPLed, and I read as minimum of it as it was necessary to understand the cause of this type of crash in HARMONY-5019 . As far as I know, this library unmapping problem has to be fixed, so I am closing this bug. If something like this happens again, I'll make a closer inspection, now that I have the knowledge of the cause of such crashes. As for character conversion, I don't know for sure. Recently class library migrated from ICU v4 to ICU v6 and there were many changes in this area. I wonder if DB2 has some logs of Java exceptions so that it would be possible to understand what Java API it is using. In any case a bug report would be welcome.
          Hide
          Chris Elford added a comment -

          I had destroyed all my em64T OSes a few months back. I've started an install of one (FC5 as I had the CDs handy). If lucky, I'll be able to give this a try on EM64T.

          Thx,
          Chris

          Show
          Chris Elford added a comment - I had destroyed all my em64T OSes a few months back. I've started an install of one (FC5 as I had the CDs handy). If lucky, I'll be able to give this a try on EM64T. Thx, Chris
          Hide
          Chris Elford added a comment -

          Hi Gregory,

          I've installed DB2 9.1 (the original DB2) on an EM64T linux and tried with M4. It fails with this stack. Similar behavior to before (it is at the end during application finish).

          (gdb) bt full
          #0 0x00002aaaaabeeaac in Properties::get ()
          from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so
          No symbol table info available.
          #1 0x00002aaaaab0b13a in get_boolean_property ()
          from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so
          No symbol table info available.
          #2 0x00002aaaaab73e9c in is_gdb_crash_handler_enabled ()
          from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so
          No symbol table info available.
          #3 0x00002aaaaabf1426 in null_java_reference_handler ()
          from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so
          No symbol table info available.
          #4 <signal handler called>
          No symbol table info available.
          #5 0x00002aaac0fb78fa in ?? ()
          No symbol table info available.
          #6 0x00002aaac0e331c8 in ?? ()
          No symbol table info available.
          #7 0x00002b4c31129be0 in _dl_argv_internal () from /lib64/ld-linux-x86-64.so.2
          No symbol table info available.
          #8 0x0000000000000000 in ?? ()
          No symbol table info available.

          I'll try to download/install DB2 9.5.

          Show
          Chris Elford added a comment - Hi Gregory, I've installed DB2 9.1 (the original DB2) on an EM64T linux and tried with M4. It fails with this stack. Similar behavior to before (it is at the end during application finish). (gdb) bt full #0 0x00002aaaaabeeaac in Properties::get () from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so No symbol table info available. #1 0x00002aaaaab0b13a in get_boolean_property () from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so No symbol table info available. #2 0x00002aaaaab73e9c in is_gdb_crash_handler_enabled () from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so No symbol table info available. #3 0x00002aaaaabf1426 in null_java_reference_handler () from /opt/java/harmony-jre-603534/bin/default/libharmonyvm.so No symbol table info available. #4 <signal handler called> No symbol table info available. #5 0x00002aaac0fb78fa in ?? () No symbol table info available. #6 0x00002aaac0e331c8 in ?? () No symbol table info available. #7 0x00002b4c31129be0 in _dl_argv_internal () from /lib64/ld-linux-x86-64.so.2 No symbol table info available. #8 0x0000000000000000 in ?? () No symbol table info available. I'll try to download/install DB2 9.5.
          Hide
          Chris Elford added a comment -

          Note that DB2 9.5 works correctly (i.e., does not crash at exit) on EM64T OS.

          I'm not sure what IBM did to their JDBC drivers between 9.1 and 9.5.

          Its still a bug in Harmony (since IBM VM works with DB2 9.1) but probably not critical anymore since the most recently released DB2 works.

          Note that the format conversion error with Harmony occurs with DB2 9.5 on both x86 and EM64T.

          Show
          Chris Elford added a comment - Note that DB2 9.5 works correctly (i.e., does not crash at exit) on EM64T OS. I'm not sure what IBM did to their JDBC drivers between 9.1 and 9.5. Its still a bug in Harmony (since IBM VM works with DB2 9.1) but probably not critical anymore since the most recently released DB2 works. Note that the format conversion error with Harmony occurs with DB2 9.5 on both x86 and EM64T.
          Hide
          Gregory Shimansky added a comment -

          Yes, I see that a crash has happened and crash handler that would have printed the stack crashed too trying to get a property. Maybe crash happened very late in the shutdown stage when properties are no longer available. I'll try to reproduce the crash myself.

          Show
          Gregory Shimansky added a comment - Yes, I see that a crash has happened and crash handler that would have printed the stack crashed too trying to get a property. Maybe crash happened very late in the shutdown stage when properties are no longer available. I'll try to reproduce the crash myself.
          Hide
          Chris Elford added a comment -

          Let me know if you need the right DB2 or if you want me to give you vnc rights to the box that I did the repro on. I have not been using that system too often lately so it is available for a bit if desired.

          Thx,
          Chris

          Show
          Chris Elford added a comment - Let me know if you need the right DB2 or if you want me to give you vnc rights to the box that I did the repro on. I have not been using that system too often lately so it is available for a bit if desired. Thx, Chris
          Hide
          Gregory Shimansky added a comment -

          I have DB2 9.1 left installed on my Gentoo box at home. I tried both debug and release bug still got no crashes, so it would really be good to have access to the system where the problem actually happens.

          Show
          Gregory Shimansky added a comment - I have DB2 9.1 left installed on my Gentoo box at home. I tried both debug and release bug still got no crashes, so it would really be good to have access to the system where the problem actually happens.
          Hide
          Gregory Shimansky added a comment -

          I reproduced the crash on Fedora Core x86_64 on Harmony M4. From what I can tell, the bug is again with some library unloading, although it doesn't seem to be related to stack unwinding as I thought. Running the sample under gdb gives the following:

          Program received signal SIGSEGV, Segmentation fault.
          0x00002aaac0a838fa in ?? ()
          (gdb) bt
          #0 0x00002aaac0a838fa in ?? ()
          #1 0x00002aaac08ff1c8 in ?? ()
          #2 0x00000037a5219be0 in _dl_argv_internal () from /lib64/ld-linux-x86-64.so.2
          #3 0x0000000000000000 in ?? ()
          (gdb) x/1i $rip
          0x2aaac0a838fa: Cannot access memory at address 0x2aaac0a838fa

          The value of RIP seems to be ok but gdb cannot show memory addressed by it. I suppose some library was there a moment before crash. I'm trying to build Harmony now to check memory map just before shutdown in JNI_DestroyJavaVM. This way I'll know which library resided on the crash address.

          Show
          Gregory Shimansky added a comment - I reproduced the crash on Fedora Core x86_64 on Harmony M4. From what I can tell, the bug is again with some library unloading, although it doesn't seem to be related to stack unwinding as I thought. Running the sample under gdb gives the following: Program received signal SIGSEGV, Segmentation fault. 0x00002aaac0a838fa in ?? () (gdb) bt #0 0x00002aaac0a838fa in ?? () #1 0x00002aaac08ff1c8 in ?? () #2 0x00000037a5219be0 in _dl_argv_internal () from /lib64/ld-linux-x86-64.so.2 #3 0x0000000000000000 in ?? () (gdb) x/1i $rip 0x2aaac0a838fa: Cannot access memory at address 0x2aaac0a838fa The value of RIP seems to be ok but gdb cannot show memory addressed by it. I suppose some library was there a moment before crash. I'm trying to build Harmony now to check memory map just before shutdown in JNI_DestroyJavaVM. This way I'll know which library resided on the crash address.
          Hide
          Chris Elford added a comment -

          If my reading is correct, this is in ibm/db2/V9.1/lib64/libdb20sse.so.1

          I <cont-z> right before the crash and grabbed the contents of /proc/pid/maps

          Show
          Chris Elford added a comment - If my reading is correct, this is in ibm/db2/V9.1/lib64/libdb20sse.so.1 I <cont-z> right before the crash and grabbed the contents of /proc/pid/maps
          Hide
          Gregory Shimansky added a comment -

          nteresting how it works, it looks like DB2 crashes itself. I set up a breakpoing on DestroyJavaVM, and then on dlclose to see each unloaded library. After many API libraries the breakpoint was hit with the following stack:

          Breakpoint 3, 0x00000037a5601240 in dlclose () from /lib64/libdl.so.2
          (gdb) bt
          #0 0x00000037a5601240 in dlclose () from /lib64/libdl.so.2
          #1 0x00002aaac070e8fa in OSSHLibrary::unload ()
          from /home/db2inst1/sqllib/lib64/libdb2osse.so.1
          #2 0x00002aaabde743de in sqlexPluginUnload ()
          from /home/db2inst1/sqllib/lib64/libdb2.so.1
          #3 0x00002aaabe368080 in sqlexAppLibTerm ()
          from /home/db2inst1/sqllib/lib64/libdb2.so.1
          #4 0x00002aaabddccafa in sqlmStreamFlagsAction ()
          from /home/db2inst1/sqllib/lib64/libdb2.so.1
          #5 0x00002aaabddccb83 in _ZN10appLibInitD9Ev ()
          from /home/db2inst1/sqllib/lib64/libdb2.so.1
          #6 0x00002aaabddccb73 in appLibInit::~appLibInit ()
          from /home/db2inst1/sqllib/lib64/libdb2.so.1
          #7 0x00000037a5332405 in exit () from /lib64/libc.so.6
          #8 0x00000037a531d08b in __libc_start_main () from /lib64/libc.so.6
          #9 0x0000000000400999 in _start ()
          #10 0x00007fffff95a008 in ?? ()
          #11 0x0000000000000000 in ?? ()

          The return address from dlclose is 0x00002aaac070e8fa which is the crash address. I set a breakpoint on this instruction after call, but right with this dlclose the library libdb2osse.so.1 is unloaded, so it appears that it unloads itself, and then after return from dlclose tries to execute some code, this doesn't work well.

          I suppose that the bug may be in DB2 9.1. Maybe in some circumstances this library is opened twice, so dlclose doesn't actually unload it, or maybe final exit() is not executed. DRLVM's kernel class implementation for System.exit() runs _exit() which doesn't execute C++ destructors, and due to some race on shutdown it may be executed instead of final exit() after main() function has finished.

          Show
          Gregory Shimansky added a comment - nteresting how it works, it looks like DB2 crashes itself. I set up a breakpoing on DestroyJavaVM, and then on dlclose to see each unloaded library. After many API libraries the breakpoint was hit with the following stack: Breakpoint 3, 0x00000037a5601240 in dlclose () from /lib64/libdl.so.2 (gdb) bt #0 0x00000037a5601240 in dlclose () from /lib64/libdl.so.2 #1 0x00002aaac070e8fa in OSSHLibrary::unload () from /home/db2inst1/sqllib/lib64/libdb2osse.so.1 #2 0x00002aaabde743de in sqlexPluginUnload () from /home/db2inst1/sqllib/lib64/libdb2.so.1 #3 0x00002aaabe368080 in sqlexAppLibTerm () from /home/db2inst1/sqllib/lib64/libdb2.so.1 #4 0x00002aaabddccafa in sqlmStreamFlagsAction () from /home/db2inst1/sqllib/lib64/libdb2.so.1 #5 0x00002aaabddccb83 in _ZN10appLibInitD9Ev () from /home/db2inst1/sqllib/lib64/libdb2.so.1 #6 0x00002aaabddccb73 in appLibInit::~appLibInit () from /home/db2inst1/sqllib/lib64/libdb2.so.1 #7 0x00000037a5332405 in exit () from /lib64/libc.so.6 #8 0x00000037a531d08b in __libc_start_main () from /lib64/libc.so.6 #9 0x0000000000400999 in _start () #10 0x00007fffff95a008 in ?? () #11 0x0000000000000000 in ?? () The return address from dlclose is 0x00002aaac070e8fa which is the crash address. I set a breakpoint on this instruction after call, but right with this dlclose the library libdb2osse.so.1 is unloaded, so it appears that it unloads itself, and then after return from dlclose tries to execute some code, this doesn't work well. I suppose that the bug may be in DB2 9.1. Maybe in some circumstances this library is opened twice, so dlclose doesn't actually unload it, or maybe final exit() is not executed. DRLVM's kernel class implementation for System.exit() runs _exit() which doesn't execute C++ destructors, and due to some race on shutdown it may be executed instead of final exit() after main() function has finished.
          Hide
          Chris Elford added a comment -

          However, it is important to note that it does not crash with:

          the IBM vm included in 9.1 (1.5.0 pxa64dev-20060222 SR1)
          BEA 1.5 update 12
          BEA 1.6 update 2
          Sun 1.5 update 14
          Sun 1.6 update 4

          It is certainly possible that it is purely a DB2 issue but I'm not fully convinced one way or the other... . Something with Harmony shutdown sequence does seem to interact in some way with the JDBC JNI shutdown sequence in Harmony that is not happening with the other VMs.

          However this issue does not manifest with DB2 version 9.5. Perhaps we should disposition of "will not fix " at this time because it works with DB2 9.5. If issue recurrs with another workload we can worry about it then perhaps?

          Thanks,
          Chris

          Show
          Chris Elford added a comment - However, it is important to note that it does not crash with: the IBM vm included in 9.1 (1.5.0 pxa64dev-20060222 SR1) BEA 1.5 update 12 BEA 1.6 update 2 Sun 1.5 update 14 Sun 1.6 update 4 It is certainly possible that it is purely a DB2 issue but I'm not fully convinced one way or the other... . Something with Harmony shutdown sequence does seem to interact in some way with the JDBC JNI shutdown sequence in Harmony that is not happening with the other VMs. However this issue does not manifest with DB2 version 9.5. Perhaps we should disposition of "will not fix " at this time because it works with DB2 9.5. If issue recurrs with another workload we can worry about it then perhaps? Thanks, Chris
          Hide
          Gregory Shimansky added a comment -

          It also doesn't crash on many other Linux distros that I tried. So it is quite specific to RedHat distributions. I want to check on Gentoo how the process is terminated to find out whether the destructor is executed there as well or the process is shut down in some different way.

          One simple difference may change the whole picture on RedHat too. If I place _exit() call at the end of main() function in the launcher, the crash would go away. This is because _exit() doesn't call C++ destructors, it just finishes the process, while exit() which is called by glibc after main() does execute destructors. This is one possible difference which could play a deciding role in this bug.

          But the bug with DRLVM doesn't appear on non-RedHat platforms too, so the difference is probably in some other place. I am not sure I can find it, but I'll try to.

          Show
          Gregory Shimansky added a comment - It also doesn't crash on many other Linux distros that I tried. So it is quite specific to RedHat distributions. I want to check on Gentoo how the process is terminated to find out whether the destructor is executed there as well or the process is shut down in some different way. One simple difference may change the whole picture on RedHat too. If I place _exit() call at the end of main() function in the launcher, the crash would go away. This is because _exit() doesn't call C++ destructors, it just finishes the process, while exit() which is called by glibc after main() does execute destructors. This is one possible difference which could play a deciding role in this bug. But the bug with DRLVM doesn't appear on non-RedHat platforms too, so the difference is probably in some other place. I am not sure I can find it, but I'll try to.
          Hide
          Gregory Shimansky added a comment -

          I checked DB2 on Gentoo again with new knowledge, and it looks like destructor appLibInit::~appLibInit is actually executed. The key is somewhere inside of function sqlexPluginUnload. It doesn't seem to call OSSHLibrary::unload although this call appears 3 times in this function assembly code. It is always jumped over with some conditions not met. I'll check tomorrow how sqlexPluginUnload works on RedHat to find the difference.

          Show
          Gregory Shimansky added a comment - I checked DB2 on Gentoo again with new knowledge, and it looks like destructor appLibInit::~appLibInit is actually executed. The key is somewhere inside of function sqlexPluginUnload. It doesn't seem to call OSSHLibrary::unload although this call appears 3 times in this function assembly code. It is always jumped over with some conditions not met. I'll check tomorrow how sqlexPluginUnload works on RedHat to find the difference.
          Hide
          Chris Elford added a comment -

          Goodness, don't you sleep?

          I'm a bit curious how this would behave on RHEL instead of a fedora core base. While Fedora is an interesting exercise, having a top tier database like DB2 working is more important on a production class OS like RHEL than with a more engineering quality release like Fedora.

          On this system, HM4 loads both libstdc+.so.6.0.8 (presumably by HM4) and libstdc.so.5.0.7 (presumably IBM jdbc). I also wonder if Harmony was compiled with the same g+ as the jdbc (or at least one that depended on libstdc++.so.5 instead of .6 if it would make a difference). I see this from stopping the run (cont-z) and then doing grep ++ /proc/pid/maps

          I'm curious on your Gentoo which libstdc++ versions you are using.

          Thx,
          Chris

          Show
          Chris Elford added a comment - Goodness, don't you sleep? I'm a bit curious how this would behave on RHEL instead of a fedora core base. While Fedora is an interesting exercise, having a top tier database like DB2 working is more important on a production class OS like RHEL than with a more engineering quality release like Fedora. On this system, HM4 loads both libstdc+ .so.6.0.8 (presumably by HM4) and libstdc .so.5.0.7 (presumably IBM jdbc). I also wonder if Harmony was compiled with the same g + as the jdbc (or at least one that depended on libstdc++.so.5 instead of .6 if it would make a difference). I see this from stopping the run (cont-z) and then doing grep ++ /proc/pid/maps I'm curious on your Gentoo which libstdc++ versions you are using. Thx, Chris
          Hide
          Chris Elford added a comment -

          Hm..... sqlexPluginUnload

          Now this one is interesting.... Google shows the following:

          http://www-1.ibm.com/support/docview.wss?uid=swg1JR25828

          does indicate an interesting timing anomoly that can result in hangs around sqlexPluginUnload that was fixed in DB2 9.1 update 3.

          Chris

          Show
          Chris Elford added a comment - Hm..... sqlexPluginUnload Now this one is interesting.... Google shows the following: http://www-1.ibm.com/support/docview.wss?uid=swg1JR25828 does indicate an interesting timing anomoly that can result in hangs around sqlexPluginUnload that was fixed in DB2 9.1 update 3. Chris
          Hide
          Chris Elford added a comment -

          I know that we're not a hang (not same symptom) but its interesting to hear that there are/were some known issues with this part of DB2.

          Show
          Chris Elford added a comment - I know that we're not a hang (not same symptom) but its interesting to hear that there are/were some known issues with this part of DB2.
          Hide
          Gregory Shimansky added a comment -

          A thought has occurred to me that the difference between your and my runs may be not in the Linux distros, but in the installation settings. This sqlexPluginUnload function has some quite complicated logic and checks a lot of DB2 internals, so its behavior may depend on how DB2 is set up. So possibly it is not Fedora or Gentoo are to be blamed for the difference, but the difference is in DB2 setup.

          Speaking of Enterprise systems, how about checking DB2 on SLES10? I've tried it on SLES9 some time ago, now we're using SLES10 and it is an enterprise version too.

          My constant failure to reproduce the crash makes me think that I am doing something wrong with setting DB2 up...

          Show
          Gregory Shimansky added a comment - A thought has occurred to me that the difference between your and my runs may be not in the Linux distros, but in the installation settings. This sqlexPluginUnload function has some quite complicated logic and checks a lot of DB2 internals, so its behavior may depend on how DB2 is set up. So possibly it is not Fedora or Gentoo are to be blamed for the difference, but the difference is in DB2 setup. Speaking of Enterprise systems, how about checking DB2 on SLES10? I've tried it on SLES9 some time ago, now we're using SLES10 and it is an enterprise version too. My constant failure to reproduce the crash makes me think that I am doing something wrong with setting DB2 up...
          Hide
          Chris Elford added a comment -

          I don't know if it would be an DB2 installation difference but I don't discount the possibility. However, there really aren't many options in the install process.

          Show
          Chris Elford added a comment - I don't know if it would be an DB2 installation difference but I don't discount the possibility. However, there really aren't many options in the install process.
          Hide
          Gregory Shimansky added a comment -

          For some reason I always get this error message when I start the instance of DB2. This is why I am worried about configuration problems. Something is not working completely right although all samples are executed as on your RH system.

          Show
          Gregory Shimansky added a comment - For some reason I always get this error message when I start the instance of DB2. This is why I am worried about configuration problems. Something is not working completely right although all samples are executed as on your RH system.
          Hide
          Gregory Shimansky added a comment -

          As an addition to the previous comment, variable DB2COMM is not set for user db2inst1, so I don't know which of the communication protocols could be failing to start, not to mention why.

          Show
          Gregory Shimansky added a comment - As an addition to the previous comment, variable DB2COMM is not set for user db2inst1, so I don't know which of the communication protocols could be failing to start, not to mention why.
          Hide
          Chris Elford added a comment -

          in db2cc can you go

          allsystems->yoursystem->instances->db2inst1

          right click and setup->communications

          On my box only TCP/IP is detected/configured and it is set to hostname=localhost.localdomain service db2c_db2inst1 and port 50000.

          I suspect that you're running some netbios or pipe helper that isn't configured completely but is there sufficiently for db2setup to detect. It could also be that you are running some firewall or similar program (I disable them because I'm already behind a firewall). I have firewall and selinux both disabled.

          I'm assuming that you are getting "real" data running with the ibm jvm. If so, it should indicate that the server is set up okay even if some transport layers don't start.

          Show
          Chris Elford added a comment - in db2cc can you go allsystems->yoursystem->instances->db2inst1 right click and setup->communications On my box only TCP/IP is detected/configured and it is set to hostname=localhost.localdomain service db2c_db2inst1 and port 50000. I suspect that you're running some netbios or pipe helper that isn't configured completely but is there sufficiently for db2setup to detect. It could also be that you are running some firewall or similar program (I disable them because I'm already behind a firewall). I have firewall and selinux both disabled. — I'm assuming that you are getting "real" data running with the ibm jvm. If so, it should indicate that the server is set up okay even if some transport layers don't start.
          Hide
          Gregory Shimansky added a comment -

          To check whether C++ destructors are called on shutdown I created a test that loaded two C++ native libraries that contained static objects. The test shows that all Java implementations execute C++ destructors both in libraries loaded as native, and those that are loaded as their linking dependencies.

          Show
          Gregory Shimansky added a comment - To check whether C++ destructors are called on shutdown I created a test that loaded two C++ native libraries that contained static objects. The test shows that all Java implementations execute C++ destructors both in libraries loaded as native, and those that are loaded as their linking dependencies.
          Hide
          Gregory Shimansky added a comment -

          Not sure if this is still a problem. I think we're done with this bug.

          Show
          Gregory Shimansky added a comment - Not sure if this is still a problem. I think we're done with this bug.

            People

            • Assignee:
              Gregory Shimansky
              Reporter:
              Chris Elford
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development