Derby
  1. Derby
  2. DERBY-5283

Crash / process termination during SYSCS_DISABLE_LOG_ARCHIVE_MODE can leave service.properties broken

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.6.1.0, 10.8.1.2
    • Fix Version/s: 10.8.3.3, 10.9.1.0
    • Component/s: Miscellaneous
    • Labels:
      None
    • Environment:
    • Bug behavior facts:
      Data corruption, Seen in production

      Description

      If Derby is terminated while SYSCS_DISABLE_LOG_ARCHIVE_MODE is being called service.properties can be left in a broken state.

      Depending on timing either of the two cases can happen:
      Case 1:
      "ERROR XBM0S: Unable to rename file 'C:\PATH_TO_DB\service.properties' to 'C:\PATH_TO_DB\service.propertiesold' " is thrown when next trying to call
      SYSCS_DISABLE_LOG_ARCHIVE_MODE(1). Both "service.properties" and "service.propertiesold" are present in the database directory. Removing "service.propertiesold" corrects the problem.

      Case 2:
      "SQLException: Database 'C:\PATH_TO_DB' not found" is thrown when booting the database. The file service.properties does not exist in the database directory but service.propertiesold does exist. Renaming the file back to "service.properties" corrects the problem.

      As mentioned above both cases have workarounds but they require manual intervention which is a problem for applications installed to a customer site. It would be great if a more reliable method to update the file could be found.

      The following sample code will reproduce the issue fairly reliably by terminating the java process:

      import java.sql.*;
      import org.apache.derby.jdbc.*;

      public class DerbyLogArchiveModeTest {

      public static void main(String[] args) {
      final EmbeddedDataSource ds = new EmbeddedDataSource();
      ds.setDatabaseName("derbyTest");
      ds.setCreateDatabase("create");

      try {
      final Connection conn = ds.getConnection();

      try {
      final Statement stmt = conn.createStatement();

      try {
      while (true)

      { stmt.execute("call SYSCS_UTIL.SYSCS_DISABLE_LOG_ARCHIVE_MODE(1)"); }

      } finally

      { stmt.close(); }

      } finally

      { conn.close(); }

      } catch (SQLException e)

      { e.printStackTrace(); }

      }
      }

      1. derby-5283_10_8_diff.txt
        27 kB
        Kathey Marsden
      2. derby-5283-1a-recover.diff
        19 kB
        Kristian Waagan
      3. derby-5283-1b-recover.diff
        34 kB
        Kristian Waagan
      4. derby-5283-1c-recover.diff
        34 kB
        Kristian Waagan
      5. derby-5283-2a-update_test_with_errorcodes.diff
        1 kB
        Kristian Waagan

        Issue Links

          Activity

          Hide
          ASF subversion and git services added a comment -

          Commit 1511979 from Myrna van Lunteren in branch 'code/branches/10.8'
          [ https://svn.apache.org/r1511979 ]

          DERBY-5283; Crash / process termination during SYSCS_DISABLE_LOG_ARCHIVE_MODE can leave service.properties broken
          fixing a javadoc warning by merging the fix from trunk.
          merge command: svn merge -c 1188144 https://svn.apache.org/repos/asf/db/derby/code/trunk

          Show
          ASF subversion and git services added a comment - Commit 1511979 from Myrna van Lunteren in branch 'code/branches/10.8' [ https://svn.apache.org/r1511979 ] DERBY-5283 ; Crash / process termination during SYSCS_DISABLE_LOG_ARCHIVE_MODE can leave service.properties broken fixing a javadoc warning by merging the fix from trunk. merge command: svn merge -c 1188144 https://svn.apache.org/repos/asf/db/derby/code/trunk
          Hide
          ASF subversion and git services added a comment -

          Commit 1504199 from Kathey Marsden in branch 'code/branches/10'
          [ https://svn.apache.org/r1504199 ]

          DERBY-5283 Crash / process termination during SYSCS_DISABLE_LOG_ARCHIVE_MODE can leave service.properties broken

          Merge from trunk also includes the fix for DERBY-5816 which was a test failure caused by the original change. The message files had to be manually merged.

          revisions merged were:
          DERBY-5283 1188109 1188828

          DERBY-5816 1350361 1353764

          Show
          ASF subversion and git services added a comment - Commit 1504199 from Kathey Marsden in branch 'code/branches/10' [ https://svn.apache.org/r1504199 ] DERBY-5283 Crash / process termination during SYSCS_DISABLE_LOG_ARCHIVE_MODE can leave service.properties broken Merge from trunk also includes the fix for DERBY-5816 which was a test failure caused by the original change. The message files had to be manually merged. revisions merged were: DERBY-5283 1188109 1188828 DERBY-5816 1350361 1353764
          Hide
          ASF subversion and git services added a comment -

          Commit 1504199 from Kathey Marsden in branch 'code/branches/10'
          [ https://svn.apache.org/r1504199 ]

          DERBY-5283 Crash / process termination during SYSCS_DISABLE_LOG_ARCHIVE_MODE can leave service.properties broken

          Merge from trunk also includes the fix for DERBY-5816 which was a test failure caused by the original change. The message files had to be manually merged.

          revisions merged were:
          DERBY-5283 1188109 1188828

          DERBY-5816 1350361 1353764

          Show
          ASF subversion and git services added a comment - Commit 1504199 from Kathey Marsden in branch 'code/branches/10' [ https://svn.apache.org/r1504199 ] DERBY-5283 Crash / process termination during SYSCS_DISABLE_LOG_ARCHIVE_MODE can leave service.properties broken Merge from trunk also includes the fix for DERBY-5816 which was a test failure caused by the original change. The message files had to be manually merged. revisions merged were: DERBY-5283 1188109 1188828 DERBY-5816 1350361 1353764
          Hide
          Kathey Marsden added a comment -

          Attached is a patch for 10.8 for this issue. It also includes the fix for DERBY-5816 which was a test failure caused by the original change. The message files had to be manually merged.

          revisions merged were:
          DERBY-5283 1188109 1188828

          DERBY-5816 1350361 1353764

          Show
          Kathey Marsden added a comment - Attached is a patch for 10.8 for this issue. It also includes the fix for DERBY-5816 which was a test failure caused by the original change. The message files had to be manually merged. revisions merged were: DERBY-5283 1188109 1188828 DERBY-5816 1350361 1353764
          Hide
          Kathey Marsden added a comment -

          Assign to myself temporarily for backport

          Show
          Kathey Marsden added a comment - Assign to myself temporarily for backport
          Hide
          Knut Anders Hatlen added a comment -

          [bulk update] Close all resolved issues that haven't been updated for more than one year.

          Show
          Knut Anders Hatlen added a comment - [bulk update] Close all resolved issues that haven't been updated for more than one year.
          Hide
          Kristian Waagan added a comment -

          Attaching patch 2a, which updates lang.ErrorCodeTest with the two new error messages added by the fix for this issue.

          Committed to trunk with revision 1188828.

          Show
          Kristian Waagan added a comment - Attaching patch 2a, which updates lang.ErrorCodeTest with the two new error messages added by the fix for this issue. Committed to trunk with revision 1188828.
          Hide
          Kristian Waagan added a comment -

          Committed patch 1c to trunk with revision 1188109.

          Show
          Kristian Waagan added a comment - Committed patch 1c to trunk with revision 1188109.
          Hide
          Kristian Waagan added a comment -

          Attaching patch 1c, which renames the getSFPath method and fixed a bug with the use of BufferedWriter.

          Show
          Kristian Waagan added a comment - Attaching patch 1c, which renames the getSFPath method and fixed a bug with the use of BufferedWriter.
          Hide
          Kristian Waagan added a comment -

          Thanks, Knut Anders.

          I'll rename that method.
          While writing it, I started thinking about whether we actually want to show the full path of the file we're having trouble with - for security reasons. For instance, with the network server it could reveal where the database lives and that may reveal where derby.system.home is. Not a big problem per se, but it would allow a potential attacker to target his/her discovery attacks (be it at the OS level or by exploiting weaknesses in Derby).

          What do people think?
          Should we aim at only printing full names to the log stream (i.e. derby.log by default), and hide or print them relative to the database directory [1] when throwing exceptions?

          [1] 'service.properties' instead of '/export/home/MyFancyApp/dbs/database1/service.properties', and 'seg0/c431.dat' instead of '/.../database1/seg0/c431.dat'.

          Show
          Kristian Waagan added a comment - Thanks, Knut Anders. I'll rename that method. While writing it, I started thinking about whether we actually want to show the full path of the file we're having trouble with - for security reasons. For instance, with the network server it could reveal where the database lives and that may reveal where derby.system.home is. Not a big problem per se, but it would allow a potential attacker to target his/her discovery attacks (be it at the OS level or by exploiting weaknesses in Derby). What do people think? Should we aim at only printing full names to the log stream (i.e. derby.log by default), and hide or print them relative to the database directory [1] when throwing exceptions? [1] 'service.properties' instead of '/export/home/MyFancyApp/dbs/database1/service.properties', and 'seg0/c431.dat' instead of '/.../database1/seg0/c431.dat'.
          Hide
          Knut Anders Hatlen added a comment -

          Patch 1b looks good to me, and the tests look complete. One minor nit: I didn't immediately understand the abbreviation in the new helper method getSFPath(). Perhaps spell it out or rename it to getMostAccuratePath()?

          Show
          Knut Anders Hatlen added a comment - Patch 1b looks good to me, and the tests look complete. One minor nit: I didn't immediately understand the abbreviation in the new helper method getSFPath(). Perhaps spell it out or rename it to getMostAccuratePath()?
          Hide
          Kristian Waagan added a comment -

          Attaching patch 1b.

          Thanks, Knut Anders.
          I have added a basic test, where I start with a pristine database, modify the two relevant files (i.e. copy or delete), and then boot the database and assert what happened to the two files.
          The test doesn't verify what is written to derby.log.

          There are many things that can be changed slightly, i.e. logging of messages and so on.

          Patch 1b ready for review.

          Show
          Kristian Waagan added a comment - Attaching patch 1b. Thanks, Knut Anders. I have added a basic test, where I start with a pristine database, modify the two relevant files (i.e. copy or delete), and then boot the database and assert what happened to the two files. The test doesn't verify what is written to derby.log. There are many things that can be changed slightly, i.e. logging of messages and so on. Patch 1b ready for review.
          Hide
          Knut Anders Hatlen added a comment -

          The suggested approach sounds good to me.

          One possible simplification:

          Instead of materializing the entire service.properties file in prepareServicePropertiesContents() outside of the try block in saveServiceProperties(), we could move it into the existing try block:

          try

          { os = servicePropertiesFile.getOutputStream(); properties.store( os, serviceName + MessageService.getTextMessage(MessageId.SERVICE_PROPERTIES_DONT_EDIT)); storageFactory.sync( os, false); os.close(); os = null; }

          and call something like os.write(SERVICE_PROPERTIES_END_TOKEN.getBytes("US-ASCII")) between properties.store() and storageFactory.sync(). That would allow us to remove the error handling in prepareServicePropertiesContents() (well, actually, we could remove the entire method), since the same error handling seems to be performed by the try/catch in saveServiceProperties().

          What would be a good way to test this fix? Creating a database, shutting it down, and then modifying service.properties/service.propertiesold before booting it?

          It might also be useful to link to the original exception in the cases where we re-throw SecurityException and IOException as StandardException (in handleSecPrivException() and resolveServicePropertiesFiles()).

          Show
          Knut Anders Hatlen added a comment - The suggested approach sounds good to me. One possible simplification: Instead of materializing the entire service.properties file in prepareServicePropertiesContents() outside of the try block in saveServiceProperties(), we could move it into the existing try block: try { os = servicePropertiesFile.getOutputStream(); properties.store( os, serviceName + MessageService.getTextMessage(MessageId.SERVICE_PROPERTIES_DONT_EDIT)); storageFactory.sync( os, false); os.close(); os = null; } and call something like os.write(SERVICE_PROPERTIES_END_TOKEN.getBytes("US-ASCII")) between properties.store() and storageFactory.sync(). That would allow us to remove the error handling in prepareServicePropertiesContents() (well, actually, we could remove the entire method), since the same error handling seems to be performed by the try/catch in saveServiceProperties(). What would be a good way to test this fix? Creating a database, shutting it down, and then modifying service.properties/service.propertiesold before booting it? It might also be useful to link to the original exception in the cases where we re-throw SecurityException and IOException as StandardException (in handleSecPrivException() and resolveServicePropertiesFiles()).
          Hide
          Kristian Waagan added a comment -

          Attaching patch 1a, an improvement to how the service.properties is dealt with:
          o add an EOF-marker when writing a new service.properties file
          o add code to recover from error conditions, in short (CUR = current/new service.properties, OLD = backed up service.properties):
          a) if !CUR and OLD rename OLD to CUR
          b) if CUR and OLD check CUR and rename/delete OLD as appropriate

          This more code changes that I had hoped more - much of it due to error handling. I haven't added localization code for the various error messages.

          Patch ready for review.

          Show
          Kristian Waagan added a comment - Attaching patch 1a, an improvement to how the service.properties is dealt with: o add an EOF-marker when writing a new service.properties file o add code to recover from error conditions, in short (CUR = current/new service.properties, OLD = backed up service.properties): a) if !CUR and OLD rename OLD to CUR b) if CUR and OLD check CUR and rename/delete OLD as appropriate This more code changes that I had hoped more - much of it due to error handling. I haven't added localization code for the various error messages. Patch ready for review.
          Hide
          Kristian Waagan added a comment -

          I think we can improve the situation here, and I have written a prototype patch that fixes this.
          However, before moving on to the code, I have a few questions.

          Is it safe to assume that we can always undo the change to the properties file?
          Undo would be to continue using the old properties file, and redo would be to start using the new properties file.
          I understand that some, if not most, of the properties are only set at database creation time.

          There are two types of error conditions:
          a) The VM is killed while updating the service properties.
          b) Updating the service properties is aborted due to a problem (most likely file privileges problem) - the VM stays up

          In (a) recovery must happen on the next boot, whereas for (b) actions must be taken to correct the situation immediately. Both of these cases would involve a set of file operations; verifying, deleting, or renaming a file.
          As a further improvement, I would suggest that we print an entry to derby.log when Derby automatically recovers from a problem with the service.properties file.

          When it comes to the content of the files on disk, are the following assumptions valid?
          1) Writing to streams from Java may produce incomplete files.
          2) Low level file operations invoked from Java, like delete and rename, will always result in valid files on disk.
          Note that this does not imply that rename(old,new) is atomic, but, assuming new didn't already exist, new will be a complete copy of old if it exists after the crash.

          I started out with a rather complex solution for this (using checksums), but I think I'll take the time to simplify it - we can add to it if required.

          Show
          Kristian Waagan added a comment - I think we can improve the situation here, and I have written a prototype patch that fixes this. However, before moving on to the code, I have a few questions. Is it safe to assume that we can always undo the change to the properties file? Undo would be to continue using the old properties file, and redo would be to start using the new properties file. I understand that some, if not most, of the properties are only set at database creation time. There are two types of error conditions: a) The VM is killed while updating the service properties. b) Updating the service properties is aborted due to a problem (most likely file privileges problem) - the VM stays up In (a) recovery must happen on the next boot, whereas for (b) actions must be taken to correct the situation immediately. Both of these cases would involve a set of file operations; verifying, deleting, or renaming a file. As a further improvement, I would suggest that we print an entry to derby.log when Derby automatically recovers from a problem with the service.properties file. When it comes to the content of the files on disk, are the following assumptions valid? 1) Writing to streams from Java may produce incomplete files. 2) Low level file operations invoked from Java, like delete and rename, will always result in valid files on disk. Note that this does not imply that rename(old,new) is atomic, but, assuming new didn't already exist, new will be a complete copy of old if it exists after the crash. I started out with a rather complex solution for this (using checksums), but I think I'll take the time to simplify it - we can add to it if required.
          Hide
          Brett Mason added a comment -

          We have worked around this issue in our application by removing the call to disable log archive mode since it was no longer needed. However we would be particularly interested in any other scenarios which would cause service.properties to be updated under "normal" usage.

          Show
          Brett Mason added a comment - We have worked around this issue in our application by removing the call to disable log archive mode since it was no longer needed. However we would be particularly interested in any other scenarios which would cause service.properties to be updated under "normal" usage.

            People

            • Assignee:
              Kristian Waagan
              Reporter:
              Brett Mason
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development