Derby
  1. Derby
  2. DERBY-5691

Document that Write Caching must be disabled to avoid possible database corruption

    Details

    • Urgency:
      Normal
    • Issue & fix info:
      High Value Fix, Workaround attached
    • Bug behavior facts:
      Data corruption

      Description

      Suggestion that we document a recommendation that Windows Write Caching be disabled on machines using Derby.

      The following article warns about Write Caching on Windows as a possbile source of database corruption:
      http://support.microsoft.com/kb/281672
      It is possible that this could be the cause of some unexplained Derby corruptions identified after power failure of other system interupt.

      Links explaining how to disable Write Caching:
      Win 2K: http://support.microsoft.com/kb/259716
      Win 2008: http://support.microsoft.com/kb/324805

      From the Windows 2008 article:
      With some third-party programs, disk write caching has to be turned on or off. Additionally, turning disk write caching on may increase operating system performance; however, it may also result in the loss of information if a power failure, equipment failure, or software failure occurs. This article describes how to turn disk write caching on or off.

      1. cadmindbintegrity.html
        7 kB
        Kim Haase
      2. DERBY-5691-6.stat
        0.0 kB
        Kim Haase
      3. DERBY-5691-6.diff
        1 kB
        Kim Haase
      4. DERBY-5691-5.zip
        9 kB
        Kim Haase
      5. DERBY-5691-5.diff
        8 kB
        Kim Haase
      6. DERBY-5691-4.zip
        9 kB
        Kim Haase
      7. DERBY-5691-4.diff
        8 kB
        Kim Haase
      8. DERBY-5691-3.zip
        9 kB
        Kim Haase
      9. DERBY-5691-3.diff
        8 kB
        Kim Haase
      10. DERBY-5691-2.zip
        9 kB
        Kim Haase
      11. DERBY-5691-2.stat
        0.2 kB
        Kim Haase
      12. DERBY-5691-2.diff
        8 kB
        Kim Haase
      13. DERBY-5691.zip
        5 kB
        Kim Haase
      14. DERBY-5691.stat
        0.1 kB
        Kim Haase
      15. DERBY-5691.diff
        5 kB
        Kim Haase

        Activity

        Hide
        Kathey Marsden added a comment -

        Actually the general requirement that write caching on the drive holding the derby database be disabled in order to insure data integrity should be documented as a requirement for all platforms and state that many platforms, e.g. Windows is enabled by default.

        Other potential sync issues should be documented as possible causes of corruption

        • Running Derby on a NFS mounted disk
        • Using runtime compression which can cause sync to fail on disk full.

        In general Derby needs to know that the sync has absolutely written the data to disk in order to ensure recovery,

        Show
        Kathey Marsden added a comment - Actually the general requirement that write caching on the drive holding the derby database be disabled in order to insure data integrity should be documented as a requirement for all platforms and state that many platforms, e.g. Windows is enabled by default. Other potential sync issues should be documented as possible causes of corruption Running Derby on a NFS mounted disk Using runtime compression which can cause sync to fail on disk full. In general Derby needs to know that the sync has absolutely written the data to disk in order to ensure recovery,
        Hide
        Kim Haase added a comment -

        It sounds as if we need a topic called something like "Avoiding database corruption". Maybe it belongs at the beginning of part 2 of the Admin Guide, just before "Checking database consistency", a related topic. We can then add to it as need be. Any other thoughts?

        This topic would include the problem with Windows write caching as well as the other two mentioned by Kathey.

        I thought the caching problem could be a Tuning Derby topic, since caching helps performance, but it turns out that it is often enabled by default on Windows systems. It is enabled on my ancient Windows XP system, for instance. People may well not know they are using it.

        I am guessing that "using runtime compression" means calling the SYSCS_UTIL.SYSCS_COMPRESS_TABLE or SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE procedure – let me know if I'm off track here.

        Show
        Kim Haase added a comment - It sounds as if we need a topic called something like "Avoiding database corruption". Maybe it belongs at the beginning of part 2 of the Admin Guide, just before "Checking database consistency", a related topic. We can then add to it as need be. Any other thoughts? This topic would include the problem with Windows write caching as well as the other two mentioned by Kathey. I thought the caching problem could be a Tuning Derby topic, since caching helps performance, but it turns out that it is often enabled by default on Windows systems. It is enabled on my ancient Windows XP system, for instance. People may well not know they are using it. I am guessing that "using runtime compression" means calling the SYSCS_UTIL.SYSCS_COMPRESS_TABLE or SYSCS_UTIL.SYSCS_INPLACE_COMPRESS_TABLE procedure – let me know if I'm off track here.
        Hide
        Kim Haase added a comment -

        Attaching DERBY-5691.diff, DERBY-5691.stat, and DERBY-5691.zip, with the following changes:

        M src/adminguide/cadminpreface23947.dita
        M src/adminguide/derbyadmin.ditamap
        A src/adminguide/cadmindbintegrity.dita

        I decided to give the new topic a more positive title, "Maintaining database integrity".

        I also included some of the more general caveats from DERBY-5508 that didn't quite fit in that patch or that bear repeating. Hope that's okay.

        Edits and additions are welcome – thanks!

        Show
        Kim Haase added a comment - Attaching DERBY-5691 .diff, DERBY-5691 .stat, and DERBY-5691 .zip, with the following changes: M src/adminguide/cadminpreface23947.dita M src/adminguide/derbyadmin.ditamap A src/adminguide/cadmindbintegrity.dita I decided to give the new topic a more positive title, "Maintaining database integrity". I also included some of the more general caveats from DERBY-5508 that didn't quite fit in that patch or that bear repeating. Hope that's okay. Edits and additions are welcome – thanks!
        Hide
        Kathey Marsden added a comment -

        Thank you Kim for adding this important documentation.

        Some specific corrections changes:

        1)To provide an overview of the overall problem. I would say add a high level description before "To avoid ... e.g.

        Derby must be able to sync to disk. Some machine, disk or operating system settings can prevent a proper sync and cause unrecoverable database corruption on power failure system or software crash. To avoid database corruption, you can do the following.

        2) I would add a sentence after: Do not enable disk write caching on the hard drive that holds the database.
        Disable write caching if it is on by default.

        3) Take out "If possible," before run Derby on a local drive.

        4)For runtime compression I was not referring to the stored procedures, but rather sometimes there is an operating system setting that allows for runtime compression of the files.

        Maybe add a catch all. Disable any other options that might prevent a proper sync to disk when Derby is writing its transaction logs or other data.

        In general, it would be good in the developers guide to point to the maintaining integrity doc as when Derby is embedded in an application, it is really the embedding product and its users that have to take on this role of maintaining database integrity. The embedding product needs to incorporate backup/restore procedures and recommendations into their own product documentation as well as these warnings.

        Show
        Kathey Marsden added a comment - Thank you Kim for adding this important documentation. Some specific corrections changes: 1)To provide an overview of the overall problem. I would say add a high level description before "To avoid ... e.g. Derby must be able to sync to disk. Some machine, disk or operating system settings can prevent a proper sync and cause unrecoverable database corruption on power failure system or software crash. To avoid database corruption, you can do the following. 2) I would add a sentence after: Do not enable disk write caching on the hard drive that holds the database. Disable write caching if it is on by default. 3) Take out "If possible," before run Derby on a local drive. 4)For runtime compression I was not referring to the stored procedures, but rather sometimes there is an operating system setting that allows for runtime compression of the files. Maybe add a catch all. Disable any other options that might prevent a proper sync to disk when Derby is writing its transaction logs or other data. — In general, it would be good in the developers guide to point to the maintaining integrity doc as when Derby is embedded in an application, it is really the embedding product and its users that have to take on this role of maintaining database integrity. The embedding product needs to incorporate backup/restore procedures and recommendations into their own product documentation as well as these warnings.
        Hide
        Kim Haase added a comment -

        Thank you, Kathey, for the great comments!

        I'm attaching a second patch, DERBY-5691-2.diff, DERBY-5691-2.stat, and DERBY-5691-2.zip, with the following changes:

        M src/adminguide/cadminpreface23947.dita
        M src/adminguide/derbyadmin.ditamap
        A src/adminguide/cadmindbintegrity.dita
        M src/devguide/cdevdgpref11181.dita
        M src/devguide/cdevdvlp14839.dita

        I added the information you suggested to two topics in the Developer's Guide: one in the "Purpose of this guide" topic in the preface and, since people don't always read the preface, in the "Application development overview" topic where the book starts to get into substantive areas. The language may need some changes, though.

        I hope the changes to the new Admin Guide topic are satisfactory. I am most uncertain about the fix to the runtime compression item.

        Please let me know if further tweaks are needed.

        Show
        Kim Haase added a comment - Thank you, Kathey, for the great comments! I'm attaching a second patch, DERBY-5691 -2.diff, DERBY-5691 -2.stat, and DERBY-5691 -2.zip, with the following changes: M src/adminguide/cadminpreface23947.dita M src/adminguide/derbyadmin.ditamap A src/adminguide/cadmindbintegrity.dita M src/devguide/cdevdgpref11181.dita M src/devguide/cdevdvlp14839.dita I added the information you suggested to two topics in the Developer's Guide: one in the "Purpose of this guide" topic in the preface and, since people don't always read the preface, in the "Application development overview" topic where the book starts to get into substantive areas. The language may need some changes, though. I hope the changes to the new Admin Guide topic are satisfactory. I am most uncertain about the fix to the runtime compression item. Please let me know if further tweaks are needed.
        Hide
        Knut Anders Hatlen added a comment -

        The patch looks fine to me (though I'm not familiar with all the causes of corruption listed there).

        Nit: Third bullet in the Maintaining database integrity topic should have a verb in the imperative form to match the other bullets. For example:

        To avoid database corruption, you can do the following:
        (...)

        • Do not enable runtime compression of files on disk, as synchronization may fail on some platforms if the disk runs out of space.

        I'm not sure exactly how that problem is specific to runtime compression, as I would also expect writes to uncompressed files to fail if the disk is full. Is the problem that these failures are not reported by the operating system and go undetected?

        Show
        Knut Anders Hatlen added a comment - The patch looks fine to me (though I'm not familiar with all the causes of corruption listed there). Nit: Third bullet in the Maintaining database integrity topic should have a verb in the imperative form to match the other bullets. For example: To avoid database corruption, you can do the following: (...) Do not enable runtime compression of files on disk, as synchronization may fail on some platforms if the disk runs out of space. I'm not sure exactly how that problem is specific to runtime compression, as I would also expect writes to uncompressed files to fail if the disk is full. Is the problem that these failures are not reported by the operating system and go undetected?
        Hide
        Kim Haase added a comment -

        Thanks for catching the parallel structure problem, Knut! I'm attaching yet another patch, DERBY-5691-3.diff and DERBY-5691-3.zip, which makes just that one change to the bullet item.

        Kathey, I would be very grateful for your feedback on the overall patch. Does the fourth, catch-all bullet encompass Knut's additional concern, or would further changes be useful?

        Show
        Kim Haase added a comment - Thanks for catching the parallel structure problem, Knut! I'm attaching yet another patch, DERBY-5691 -3.diff and DERBY-5691 -3.zip, which makes just that one change to the bullet item. Kathey, I would be very grateful for your feedback on the overall patch. Does the fourth, catch-all bullet encompass Knut's additional concern, or would further changes be useful?
        Hide
        Mike Matrigali added a comment -

        with respect to compressed file systems I don't think we ever verified a problem. At one point we saw some corruptions on compressed file system and out of disk space but never could reproduce. I don't think we ever test this configuration. The concern is that derby only expects an out of disk space error when growing the file. The concern was that we would write a mostly 0 filled page at alloc time and succeed. Then fill up the disk. Then do something to the page that would require the compressed version of the page to take more space and failing at that point. The system is not designed
        to handle this. Again this is all a theory as there has been no repro or testing, and would be particular to what kind of
        compression is being done.

        One particularly bad case would be if only half of an updated page gets written before out of disk space we are likely to get unrecoverable checksum errors.

        Show
        Mike Matrigali added a comment - with respect to compressed file systems I don't think we ever verified a problem. At one point we saw some corruptions on compressed file system and out of disk space but never could reproduce. I don't think we ever test this configuration. The concern is that derby only expects an out of disk space error when growing the file. The concern was that we would write a mostly 0 filled page at alloc time and succeed. Then fill up the disk. Then do something to the page that would require the compressed version of the page to take more space and failing at that point. The system is not designed to handle this. Again this is all a theory as there has been no repro or testing, and would be particular to what kind of compression is being done. One particularly bad case would be if only half of an updated page gets written before out of disk space we are likely to get unrecoverable checksum errors.
        Hide
        Kim Haase added a comment -

        I wonder, then, if it would make sense to remove that third bullet about runtime compression, if we don't know that it causes a problem.

        Is it too much of a belaboring-the-obvious statement to tell people to make sure there is always lots of extra space on the disk that holds their database, because bad things can happen if they run out of disk space in the middle of a database operation? It sounds as if that is the problem, not file compression?

        Show
        Kim Haase added a comment - I wonder, then, if it would make sense to remove that third bullet about runtime compression, if we don't know that it causes a problem. Is it too much of a belaboring-the-obvious statement to tell people to make sure there is always lots of extra space on the disk that holds their database, because bad things can happen if they run out of disk space in the middle of a database operation? It sounds as if that is the problem, not file compression?
        Hide
        Rick Hillegas added a comment -

        Hi Kim,

        I'm a big fan of belaboring the obvious. Derby is used by a lot of people who aren't database experts. What seems obvious to us may not seem obvious to many of our users. Thanks.

        Show
        Rick Hillegas added a comment - Hi Kim, I'm a big fan of belaboring the obvious. Derby is used by a lot of people who aren't database experts. What seems obvious to us may not seem obvious to many of our users. Thanks.
        Hide
        Kim Haase added a comment -

        Thanks for the advice, Rick. I'm attaching DERBY-5691-4.diff and DERBY-5691-4.zip. This patch replaces the bullet item in cadmindbintegrity.dita about runtime compression with a warning about keeping enough space on the disk. Hope this is okay.

        Show
        Kim Haase added a comment - Thanks for the advice, Rick. I'm attaching DERBY-5691 -4.diff and DERBY-5691 -4.zip. This patch replaces the bullet item in cadmindbintegrity.dita about runtime compression with a warning about keeping enough space on the disk. Hope this is okay.
        Hide
        Rick Hillegas added a comment -

        Hi Kim,

        These changes look clear to me. +1

        Show
        Rick Hillegas added a comment - Hi Kim, These changes look clear to me. +1
        Hide
        Dag H. Wanvik added a comment -

        For platform neutrality it may be good to change this line:

        "Run Derby on a local drive rather than on an NFS-mounted disk. "

        to include "NFS mounted, SMB mounted or other network mounted disk"

        Show
        Dag H. Wanvik added a comment - For platform neutrality it may be good to change this line: "Run Derby on a local drive rather than on an NFS-mounted disk. " to include "NFS mounted, SMB mounted or other network mounted disk"
        Hide
        Kim Haase added a comment -

        Thanks for catching that, Dag. I'm attaching another patch, DERBY-5691-5.diff and DERBY-5691-5.zip, that makes the platform neutrality change you suggest.

        Hope this is okay now.

        Show
        Kim Haase added a comment - Thanks for catching that, Dag. I'm attaching another patch, DERBY-5691 -5.diff and DERBY-5691 -5.zip, that makes the platform neutrality change you suggest. Hope this is okay now.
        Hide
        Dag H. Wanvik added a comment -

        Thanks, Kim! +1

        Show
        Dag H. Wanvik added a comment - Thanks, Kim! +1
        Hide
        Kim Haase added a comment -

        Thanks again, Dag. I'm committing this to both 10.9 and 10.8, since the advice is not specific to this release. Hope that's okay.

        Committed patch DERBY-5691-5.diff to documentation trunk at revision 1341468.
        Merged to 10.8 doc branch at revision 1341476.

        Show
        Kim Haase added a comment - Thanks again, Dag. I'm committing this to both 10.9 and 10.8, since the advice is not specific to this release. Hope that's okay. Committed patch DERBY-5691 -5.diff to documentation trunk at revision 1341468. Merged to 10.8 doc branch at revision 1341476.
        Hide
        Kathey Marsden added a comment -

        Hi Kim,
        I am sorry I missed the change regarding disk full. Disk full in and of itself should not in itself cause corruption but will prevent boot. Normally once more disk is added or made available, Derby should be able to recover. Of course it is always a good idea to leave plenty of disk space so you can still run, but not doing so should not corrupt your database.

        As Mike said, we had one scenario where we theorized that runtime file system compression violated Derby's assumption that it will only get out of disk space error when growing the file. That was the motivation for the entry and the case where we think corruption could occur.

        Also, I am going to open up a new issue on the causes of and recovering from a checksum error which I think will fit well into this new section.

        Thanks for your work on this important issue.

        Show
        Kathey Marsden added a comment - Hi Kim, I am sorry I missed the change regarding disk full. Disk full in and of itself should not in itself cause corruption but will prevent boot. Normally once more disk is added or made available, Derby should be able to recover. Of course it is always a good idea to leave plenty of disk space so you can still run, but not doing so should not corrupt your database. As Mike said, we had one scenario where we theorized that runtime file system compression violated Derby's assumption that it will only get out of disk space error when growing the file. That was the motivation for the entry and the case where we think corruption could occur. Also, I am going to open up a new issue on the causes of and recovering from a checksum error which I think will fit well into this new section. Thanks for your work on this important issue.
        Hide
        Kim Haase added a comment -

        Thanks, Kathey. I'm reopening the issue to make a change to that topic (Maintaining database integrity).

        Show
        Kim Haase added a comment - Thanks, Kathey. I'm reopening the issue to make a change to that topic (Maintaining database integrity).
        Hide
        Kim Haase added a comment -

        I'm not certain what to do with that particular bullet item other than just removing it, since Mike emphasized that it has not been possible to confirm that runtime compression actually leads to problems – it is an unconfirmed hypothesis.

        Show
        Kim Haase added a comment - I'm not certain what to do with that particular bullet item other than just removing it, since Mike emphasized that it has not been possible to confirm that runtime compression actually leads to problems – it is an unconfirmed hypothesis.
        Hide
        Kim Haase added a comment -

        Attaching DERBY-5691-6.diff, DERBY-5691-6.stat, and cadmindbintegrity.html, with a change to that topic:

        M src/adminguide/cadmindbintegrity.dita

        I removed the bullet item about runtime compression but am open to other options.

        I also made a change related to Rick's advice about DERBY-5508 (when to check database consistency).

        Show
        Kim Haase added a comment - Attaching DERBY-5691 -6.diff, DERBY-5691 -6.stat, and cadmindbintegrity.html, with a change to that topic: M src/adminguide/cadmindbintegrity.dita I removed the bullet item about runtime compression but am open to other options. I also made a change related to Rick's advice about DERBY-5508 (when to check database consistency).
        Hide
        Kathey Marsden added a comment -

        Thanks Kim,

        I Filed DERBY-5778 to test this scenario explicitly to see if it is a problem. I suppose it is ok to remove it until we have verified the problem. Certainly what you have added is a vast improvement and the Disable any other settings or options that might prevent a proper sync to disk will get people thinking about other options on the underlying disk that might cause problems.

        Show
        Kathey Marsden added a comment - Thanks Kim, I Filed DERBY-5778 to test this scenario explicitly to see if it is a problem. I suppose it is ok to remove it until we have verified the problem. Certainly what you have added is a vast improvement and the Disable any other settings or options that might prevent a proper sync to disk will get people thinking about other options on the underlying disk that might cause problems.
        Hide
        Kim Haase added a comment -

        Thanks very much, Kathey. I'll commit this patch, then. We can reopen this issue or file another one based on the DERBY-5778 findings. I also look forward to the checksum issue you will be filing. Looks as if this new topic will be in flux for future releases (including bug-fix ones) as we learn more.

        Show
        Kim Haase added a comment - Thanks very much, Kathey. I'll commit this patch, then. We can reopen this issue or file another one based on the DERBY-5778 findings. I also look forward to the checksum issue you will be filing. Looks as if this new topic will be in flux for future releases (including bug-fix ones) as we learn more.
        Hide
        Kim Haase added a comment -

        Committed patch DERBY-5691-6.diff to documentation trunk at revision 1341858.
        Merged to 10.8 doc branch at revision 1341860.

        Show
        Kim Haase added a comment - Committed patch DERBY-5691 -6.diff to documentation trunk at revision 1341858. Merged to 10.8 doc branch at revision 1341860.
        Hide
        Kim Haase added a comment -

        Changes have appeared in Latest Alpha Manuals.

        Show
        Kim Haase added a comment - Changes have appeared in Latest Alpha Manuals.

          People

          • Assignee:
            Kim Haase
            Reporter:
            Stan Bradbury
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development