Derby
  1. Derby
  2. DERBY-4196

Document initiation of replication from cleanly shut down database

    Details

    • Urgency:
      Urgent

      Description

      The admin guide describes how to start replication.
      http://db.apache.org/derby/docs/dev/adminguide/cadminreplicstartrun.html

      It describes two steps that must be performed before the database is copied from the master to the slave:

      1. Boot the database on the master system
      2. Freeze the database (CALL SYSCS_UTIL.SYSCS_FREEZE_DATABASE())

      Those two steps could be replaced with a single step:

      1-2) Make sure the database on the master system is shut down cleanly

      This works because then there is no recovery to be performed when the database later is booted in master mode, and neither the log nor the database will be modified during boot, so the master database will stay completely in sync with the slave.

      Advantages with the alternative procedure are:

      • no need to keep a process running with the database booted and frozen while copying the database from the master system to the slave system
      • uncommitted transactions that are active at the time of the copying won't cause any problems (DERBY-3896)
      1. rrefattribstartslave.html
        6 kB
        Kim Haase
      2. DERBY-4196-2.diff
        0.6 kB
        Kim Haase
      3. DERBY-4196.zip
        5 kB
        Kim Haase
      4. DERBY-4196.diff
        4 kB
        Kim Haase

        Issue Links

          Activity

          Hide
          Dag H. Wanvik added a comment -

          In view of the newly discovered DERBY-4299, I am bumping this to Urgent.

          Show
          Dag H. Wanvik added a comment - In view of the newly discovered DERBY-4299 , I am bumping this to Urgent.
          Hide
          Dag H. Wanvik added a comment -

          I am wondering if DERBY-4299 doesn't really call into question the soundness of the "freeze database" approach.
          In DERBY-4299, the ASSERT seen happens because LogToFile.appendLogRecord does no writing of a log record that is later
          accessed in the boot phase in "SLAVE_PRE_MODE" (log records are not written in this "pre" boot used to authenticate to avoid
          getting out of synch with the master).

          Show
          Dag H. Wanvik added a comment - I am wondering if DERBY-4299 doesn't really call into question the soundness of the "freeze database" approach. In DERBY-4299 , the ASSERT seen happens because LogToFile.appendLogRecord does no writing of a log record that is later accessed in the boot phase in "SLAVE_PRE_MODE" (log records are not written in this "pre" boot used to authenticate to avoid getting out of synch with the master).
          Hide
          Kim Haase added a comment -

          I'm working on this, but a little more is needed than just condensing the steps. Step 5 says,

          A successful use of the startMaster=true attribute will also unfreeze the database.

          Will it also start a database that's been shut down? We're now telling users to shut down the database, not just freeze it.

          I think the startMaster=true attribute description in the Reference Manual needs changing too.

          Thanks in advance.

          Show
          Kim Haase added a comment - I'm working on this, but a little more is needed than just condensing the steps. Step 5 says, A successful use of the startMaster=true attribute will also unfreeze the database. Will it also start a database that's been shut down? We're now telling users to shut down the database, not just freeze it. I think the startMaster=true attribute description in the Reference Manual needs changing too. Thanks in advance.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Kim!

          startMaster=true will boot the database if it's not already booted. Since we don't want to mention the freeze step anymore, we could probably skip the sentence about startMaster=true unfreezing the database. The paragraph starting with "If any unlogged operations are running" will not make much sense either if we make these changes, and can be removed (no unlogged operations can be running in a database that's not booted).

          In the reference manual, I think this change must be made:

          Before you specify this attribute, you must boot the database on the master system, freeze it, perform a file system copy (...)
          -> (...) this attribute, you must cleanly shut down the database on the master system, perform a file system copy (...)

          I don't know if we should remove the paragraph about unlogged operations from the reference manual, or if we should just make it less prominent. It is still the behaviour of startMaster=true if the database is already booted when it's invoked, so it might still make sense to mention it. Perhaps just change the start first sentence in that paragraph to "If the master database is already booted and any unlogged operations are running"?

          Show
          Knut Anders Hatlen added a comment - Thanks Kim! startMaster=true will boot the database if it's not already booted. Since we don't want to mention the freeze step anymore, we could probably skip the sentence about startMaster=true unfreezing the database. The paragraph starting with "If any unlogged operations are running" will not make much sense either if we make these changes, and can be removed (no unlogged operations can be running in a database that's not booted). In the reference manual, I think this change must be made: Before you specify this attribute, you must boot the database on the master system, freeze it, perform a file system copy (...) -> (...) this attribute, you must cleanly shut down the database on the master system, perform a file system copy (...) I don't know if we should remove the paragraph about unlogged operations from the reference manual, or if we should just make it less prominent. It is still the behaviour of startMaster=true if the database is already booted when it's invoked, so it might still make sense to mention it. Perhaps just change the start first sentence in that paragraph to "If the master database is already booted and any unlogged operations are running"?
          Hide
          Kim Haase added a comment -

          Thanks, Knut, I was wondering about those sentences. Those are great suggestions.

          I am wondering about the later part of the paragraph on unlogged operations – it describes exactly what the error message says:

          "The message instructs the user to unfreeze the database to allow the operations to complete, and then to specify startMaster=true again."

          I think I should just remove that sentence and have the paragraph end with "an error message appears."

          That leaves the issue of the actual error message text. XRE23 says, "Replication master cannot be started since unlogged operations are in progress, unfreeze to allow unlogged operations to complete and restart replication." Should it be left as it is, since the only way it's likely to come up is if someone did freeze the database instead of shutting it down?

          I'll post a patch that includes your suggestions.

          Show
          Kim Haase added a comment - Thanks, Knut, I was wondering about those sentences. Those are great suggestions. I am wondering about the later part of the paragraph on unlogged operations – it describes exactly what the error message says: "The message instructs the user to unfreeze the database to allow the operations to complete, and then to specify startMaster=true again." I think I should just remove that sentence and have the paragraph end with "an error message appears." That leaves the issue of the actual error message text. XRE23 says, "Replication master cannot be started since unlogged operations are in progress, unfreeze to allow unlogged operations to complete and restart replication." Should it be left as it is, since the only way it's likely to come up is if someone did freeze the database instead of shutting it down? I'll post a patch that includes your suggestions.
          Hide
          Kim Haase added a comment -

          Attaching DERBY-4196.diff and DERBY-4196.zip, with changes to two files:

          M src/adminguide/cadminreplicstartrun.dita
          M src/ref/rrefattribstartmaster.dita

          Please let me know if further changes are needed.

          Show
          Kim Haase added a comment - Attaching DERBY-4196 .diff and DERBY-4196 .zip, with changes to two files: M src/adminguide/cadminreplicstartrun.dita M src/ref/rrefattribstartmaster.dita Please let me know if further changes are needed.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the patch, Kim. The changes look very good to me. I agree that it's best to remove the description of the exact error message from the paragraph. It's probably OK to leave the actual message as it is, since the freeze/unfreeze approach is very likely to have been used if the situation occurs.

          +1 to commit.

          Show
          Knut Anders Hatlen added a comment - Thanks for the patch, Kim. The changes look very good to me. I agree that it's best to remove the description of the exact error message from the paragraph. It's probably OK to leave the actual message as it is, since the freeze/unfreeze approach is very likely to have been used if the situation occurs. +1 to commit.
          Hide
          Kim Haase added a comment -

          Thanks very much, Knut!

          While I was in the middle of merging the patch to the branch, I did a final check and realized that the documentation of the startSlave attribute has another reference to freezing the database, so I'll need to do another patch after this one. Sorry, I should have checked more thoroughly before.

          Committed patch DERBY-4196.diff to documentation trunk at revision 793902.
          Merged to 10.5 doc branch at revision 793905.

          A second patch will follow.

          Show
          Kim Haase added a comment - Thanks very much, Knut! While I was in the middle of merging the patch to the branch, I did a final check and realized that the documentation of the startSlave attribute has another reference to freezing the database, so I'll need to do another patch after this one. Sorry, I should have checked more thoroughly before. Committed patch DERBY-4196 .diff to documentation trunk at revision 793902. Merged to 10.5 doc branch at revision 793905. A second patch will follow.
          Hide
          Kim Haase added a comment -

          Attaching DERBY-4196-2.diff and rrefattribstartslave.html, a one-line change that corrects this additional reference topic to refer to a shutdown rather than a freeze of the master database.

          Show
          Kim Haase added a comment - Attaching DERBY-4196 -2.diff and rrefattribstartslave.html, a one-line change that corrects this additional reference topic to refer to a shutdown rather than a freeze of the master database.
          Hide
          Knut Anders Hatlen added a comment -

          Good catch, Kim. The patch looks fine.

          I took a quick look at the other replication-related attributes (failover, stopSlave, stopMaster, slavePort, slaveHost) and it looks like we're covered now. Thanks.

          Show
          Knut Anders Hatlen added a comment - Good catch, Kim. The patch looks fine. I took a quick look at the other replication-related attributes (failover, stopSlave, stopMaster, slavePort, slaveHost) and it looks like we're covered now. Thanks.
          Hide
          Kim Haase added a comment -

          Thanks again, Knut.

          Committed patch DERBY-4196-2.diff to documentation trunk at revision 794028.
          Merged to 10.5 doc branch at revision 794044.

          Show
          Kim Haase added a comment - Thanks again, Knut. Committed patch DERBY-4196 -2.diff to documentation trunk at revision 794028. Merged to 10.5 doc branch at revision 794044.
          Hide
          Knut Anders Hatlen added a comment -

          Verified in the latest alpha manuals. Closing.

          Show
          Knut Anders Hatlen added a comment - Verified in the latest alpha manuals. Closing.

            People

            • Assignee:
              Kim Haase
              Reporter:
              Knut Anders Hatlen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development