Derby
  1. Derby
  2. DERBY-2872

Add Replication functionality to Derby

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.4.1.3
    • Component/s: Replication
    • Labels:
      None

      Description

      It would be nice to have replication functionality to Derby; many potential Derby users seem to want this. The attached functional specification lists some initial thoughts for how this feature may work.

      Dag Wanvik had a look at this functionality some months ago. He wrote a proof of concept patch that enables replication by copying (using file system copy) and redoing the existing Derby transaction log to the slave (unfortunately, I can not find the mail thread now).

      DERBY-2852 contains a patch that enables replication by sending dedicated logical log records to the slave through a network connection and redoing these.

      Replication has been requested and discussed previously in multiple threads, including these:

      http://mail-archives.apache.org/mod_mbox/db-derby-user/200504.mbox/%3c426E04C1.1070904@yahoo.de%3e
      http://www.nabble.com/Does-Derby-support-Transaction-Logging---t2626667.html

      1. master_classes_1.pdf
        5 kB
        Jørgen Løland
      2. poc_master_v2.diff
        35 kB
        Jørgen Løland
      3. poc_master_v2.stat
        0.7 kB
        Jørgen Løland
      4. poc_master_v2b.diff
        44 kB
        Jørgen Løland
      5. poc_slave_v2.diff
        40 kB
        Jørgen Løland
      6. poc_slave_v2.stat
        0.7 kB
        Jørgen Løland
      7. poc_slave_v2b.diff
        54 kB
        Jørgen Løland
      8. poc_slave_v2c.diff
        56 kB
        V.Narayanan
      9. proof_of_concept_master.diff
        65 kB
        Jørgen Løland
      10. proof_of_concept_master.stat
        0.4 kB
        Jørgen Løland
      11. proof_of_concept_slave.diff
        18 kB
        Jørgen Løland
      12. proof_of_concept_slave.stat
        0.6 kB
        Jørgen Løland
      13. proof-of-concept_v2b-howto.txt
        4 kB
        Jørgen Løland
      14. replication_funcspec_v10.html
        30 kB
        Jørgen Løland
      15. replication_funcspec_v10.html
        30 kB
        Jørgen Løland
      16. replication_funcspec_v2.html
        8 kB
        Jørgen Løland
      17. replication_funcspec_v3.html
        11 kB
        Jørgen Løland
      18. replication_funcspec_v4.html
        13 kB
        V.Narayanan
      19. replication_funcspec_v5.html
        12 kB
        Jørgen Løland
      20. replication_funcspec_v6.html
        17 kB
        Jørgen Løland
      21. replication_funcspec_v7.html
        23 kB
        Jørgen Løland
      22. replication_funcspec_v8.html
        26 kB
        Jørgen Løland
      23. replication_funcspec_v9.html
        29 kB
        Jørgen Løland
      24. replication_funcspec.html
        6 kB
        Jørgen Løland
      25. replication_script.txt
        7 kB
        Jørgen Løland
      26. ReplicationDesign.pdf
        5 kB
        V.Narayanan
      27. ReplicationWriteup.txt
        11 kB
        V.Narayanan
      28. slave_classes_1.pdf
        4 kB
        Jørgen Løland

        Issue Links

          Activity

          Hide
          Jørgen Løland added a comment - - edited

          A proof of concept patch is attached. It demonstrates that shipping derby log records from the master to the slave and redoing them at the slave, as described in the functional specification, works.

          The patch modifies classes in the LogFactory service in rawstore so that the derby log records are sent from the master to the slave and written to the log file on the slave. When the slave is rebooted, the operations that were performed on the master are reflected at the slave as well.

          I have tried a few combinations of insert/update/delete/create table operations with only one transaction at a time on the master. The test cases have run successfull, meaning that the slave state has been equal to the master.

          The patch has many shortcomings that need to be addressed. Some important ones are

          • the slave has to be rebooted before the shipped log is applied
          • initially, the master database has to be copied to the slave location using file system copy
          • replication is started for all databases controlled by the derby instances in question
          • log shipping is done in a blocking way, one log record at a time
          • RMI is used to ship the log records

          Attachements:
          proof_of_concept_master* - patch for the Derby that will have the master role. ant all fails (toursdb); workaround: use ant buildsource instead
          proof_of_concept_slave* - patch for the Derby that will have the slave role
          replication_script.txt - a few insert/update/delete/create scenarios that have successfully demonstrated the concept.

          Show
          Jørgen Løland added a comment - - edited A proof of concept patch is attached. It demonstrates that shipping derby log records from the master to the slave and redoing them at the slave, as described in the functional specification, works. The patch modifies classes in the LogFactory service in rawstore so that the derby log records are sent from the master to the slave and written to the log file on the slave. When the slave is rebooted, the operations that were performed on the master are reflected at the slave as well. I have tried a few combinations of insert/update/delete/create table operations with only one transaction at a time on the master. The test cases have run successfull, meaning that the slave state has been equal to the master. The patch has many shortcomings that need to be addressed. Some important ones are the slave has to be rebooted before the shipped log is applied initially, the master database has to be copied to the slave location using file system copy replication is started for all databases controlled by the derby instances in question log shipping is done in a blocking way, one log record at a time RMI is used to ship the log records Attachements: proof_of_concept_master* - patch for the Derby that will have the master role. ant all fails (toursdb); workaround: use ant buildsource instead proof_of_concept_slave* - patch for the Derby that will have the slave role replication_script.txt - a few insert/update/delete/create scenarios that have successfully demonstrated the concept.
          Hide
          Rick Hillegas added a comment -

          Hi Jørgen,

          Thanks for tackling this important feature. I have some comments on the functional spec. It would be great if you could specify the customer experience in greater detail:

          1) How does the customer turn replication on and off? For instance, is this done by calling a system procedure?

          2) Who has permission to turn replication on and off? E.g., the database owner?

          3) After failover, when the slave is promoted to be the new master, presumably the customer will want replication to resume. What is the customer experience here? Does replication automatically begin to some default database? Alternatively, is a notification sent to some registered listener who can then restart replication?

          4) Similarly, what's the customer experience when the slave fails? Does replication begin from the master to a new default database? Is a notification sent to a listener who can then restart replication?

          Thanks!

          Show
          Rick Hillegas added a comment - Hi Jørgen, Thanks for tackling this important feature. I have some comments on the functional spec. It would be great if you could specify the customer experience in greater detail: 1) How does the customer turn replication on and off? For instance, is this done by calling a system procedure? 2) Who has permission to turn replication on and off? E.g., the database owner? 3) After failover, when the slave is promoted to be the new master, presumably the customer will want replication to resume. What is the customer experience here? Does replication automatically begin to some default database? Alternatively, is a notification sent to some registered listener who can then restart replication? 4) Similarly, what's the customer experience when the slave fails? Does replication begin from the master to a new default database? Is a notification sent to a listener who can then restart replication? Thanks!
          Hide
          Jørgen Løland added a comment -

          Thanks for your comments, Rick. I will attach a new func spec in a few days.

          > 1) How does the customer turn replication on and off? For instance, is this done by calling a system procedure?

          The slave is kept in recovery mode until it becomes a master (see answer to 3 and 4). This means that calling connect on the slave will never return a connection, and stored procedures can therefore not be used on the slave.

          The user interaction required to start a master and a slave should be as similar as possible. Adding a few commands to NetworkServerControl seems like a viable solution:

          • The command used to start the slave role for 'x' should make Derby listen on a specified port for a connection with the master of 'x'.
          • The command used to start the master role for 'x' should make Derby connect to a URL, and start replication of DB 'x' once a connection is established.

          Similar commands to stop replication, make slave become master etc are also needed.

          > 2) Who has permission to turn replication on and off? E.g., the database owner?

          I think DB owner sounds reasonable.

          > 3) After failover, when the slave is promoted to be the new master, presumably the customer will want replication to resume. What is the customer experience here? Does replication automatically begin to some default database? Alternatively, is a notification sent to some registered listener who can then restart replication?
          > 4) Similarly, what's the customer experience when the slave fails? Does replication begin from the master to a new default database? Is a notification sent to a listener who can then restart replication?

          Basically, the two derby instances forming a replication pair can be in the following states:

          Master:
          m1: connect (including create DB)
          m2: run in normal Derby mode
          m3: become master (set up nw communication, ship DB to slave)
          m4: run in master mode, which is similar to m2 for normal users
          m5a: slave fails? go to m2
          m5b: master (this Derby instance) fails? done

          Slave:
          s1: start (listen on a port for connections, receive a DB over nw)
          s2: slave is kept in recovery mode, which lets us forward recover received log records for "free". Receives log records from master and performs forward recovery on these
          s3a: master fails? complete the booting and go to m2
          s3a: slave (this Derby instance) fails? done

          Hence, the behavior is the same regardless of whether the "old slave" or "old master" is the instance that is still alive when one of the instances failed. In both cases, new replication must be started manually on the surviving Derby instance. So far, I have not considered restarting replication automatically after an instance has failed.

          (Note that the presented states are not accurate enough to describe everything that happens. For example, if the connection between the master and slave fails temporarily when the pair is in state m4/s2, the pair will try to reconnect for some time.)

          Show
          Jørgen Løland added a comment - Thanks for your comments, Rick. I will attach a new func spec in a few days. > 1) How does the customer turn replication on and off? For instance, is this done by calling a system procedure? The slave is kept in recovery mode until it becomes a master (see answer to 3 and 4). This means that calling connect on the slave will never return a connection, and stored procedures can therefore not be used on the slave. The user interaction required to start a master and a slave should be as similar as possible. Adding a few commands to NetworkServerControl seems like a viable solution: The command used to start the slave role for 'x' should make Derby listen on a specified port for a connection with the master of 'x'. The command used to start the master role for 'x' should make Derby connect to a URL, and start replication of DB 'x' once a connection is established. Similar commands to stop replication, make slave become master etc are also needed. > 2) Who has permission to turn replication on and off? E.g., the database owner? I think DB owner sounds reasonable. > 3) After failover, when the slave is promoted to be the new master, presumably the customer will want replication to resume. What is the customer experience here? Does replication automatically begin to some default database? Alternatively, is a notification sent to some registered listener who can then restart replication? > 4) Similarly, what's the customer experience when the slave fails? Does replication begin from the master to a new default database? Is a notification sent to a listener who can then restart replication? Basically, the two derby instances forming a replication pair can be in the following states: Master: m1: connect (including create DB) m2: run in normal Derby mode m3: become master (set up nw communication, ship DB to slave) m4: run in master mode, which is similar to m2 for normal users m5a: slave fails? go to m2 m5b: master (this Derby instance) fails? done Slave: s1: start (listen on a port for connections, receive a DB over nw) s2: slave is kept in recovery mode, which lets us forward recover received log records for "free". Receives log records from master and performs forward recovery on these s3a: master fails? complete the booting and go to m2 s3a: slave (this Derby instance) fails? done Hence, the behavior is the same regardless of whether the "old slave" or "old master" is the instance that is still alive when one of the instances failed. In both cases, new replication must be started manually on the surviving Derby instance. So far, I have not considered restarting replication automatically after an instance has failed. (Note that the presented states are not accurate enough to describe everything that happens. For example, if the connection between the master and slave fails temporarily when the pair is in state m4/s2, the pair will try to reconnect for some time.)
          Hide
          Jørgen Løland added a comment -

          Link to the thread where Dag describes his proof-of-concept code: http://www.mailinglistarchive.com/derby-dev@db.apache.org/msg25546.html

          A difference between the proof of concept codes: While Dag's code ships whole log files, the poc attached to this issue ships individual log records.

          Show
          Jørgen Løland added a comment - Link to the thread where Dag describes his proof-of-concept code: http://www.mailinglistarchive.com/derby-dev@db.apache.org/msg25546.html A difference between the proof of concept codes: While Dag's code ships whole log files, the poc attached to this issue ships individual log records.
          Hide
          Jørgen Løland added a comment -

          I have had a good look at the code in the store layer now. The design sketches for this feature are currently like this:

          Master role:
          -----------
          When a Derby instance is told to be the master of a db 'repli_db', a replication service is booted in the store layer. This service will set up a replication log buffer and network communication with the slave. The db is then sent to the slave. From this point on, all log records that are added to the log of 'repli_db' are also added to the replication log buffer.

          A replication log shipper (part of the replication service) will read log records from the replication buffer and send these to the slave, e.g. at given time intervals.

          Slave role:
          ----------
          As for the master role, a replication service will be booted in the store layer when a Derby is told to be the slave of a db. The replication slave service will set up a network connection with the master and receive a db backup. Recovery is started, and the slave will not leave recovery until it is told to transform into a normal Derby. While in recovery mode, the slave will redo all log records that are received from the master.

          Comments on this preliminary design are very much welcome!

          Show
          Jørgen Løland added a comment - I have had a good look at the code in the store layer now. The design sketches for this feature are currently like this: Master role: ----------- When a Derby instance is told to be the master of a db 'repli_db', a replication service is booted in the store layer. This service will set up a replication log buffer and network communication with the slave. The db is then sent to the slave. From this point on, all log records that are added to the log of 'repli_db' are also added to the replication log buffer. A replication log shipper (part of the replication service) will read log records from the replication buffer and send these to the slave, e.g. at given time intervals. Slave role: ---------- As for the master role, a replication service will be booted in the store layer when a Derby is told to be the slave of a db. The replication slave service will set up a network connection with the master and receive a db backup. Recovery is started, and the slave will not leave recovery until it is told to transform into a normal Derby. While in recovery mode, the slave will redo all log records that are received from the master. Comments on this preliminary design are very much welcome!
          Hide
          Jørgen Løland added a comment -

          New functional specification including the details requested by Rick Hillegas.

          Show
          Jørgen Løland added a comment - New functional specification including the details requested by Rick Hillegas.
          Hide
          Jørgen Løland added a comment -

          This time with correct modification date

          Show
          Jørgen Løland added a comment - This time with correct modification date
          Hide
          Rick Hillegas added a comment -

          Thanks for rev 2 of the spec, Jørgen.

          Looks like you have addressed issue (1). I see in your comments above, that you are in agreement about how to address issue (2), but I don't see this reflected in the new spec itself. I'm getting the impression that the answer to (3) and (4) is that the first rev of replication won't handle these issues; instead, they will be addressed in a later rev. Is that right?

          I have some more comments:

          5) A heads-up about the user/password options on the new server commands. There has been some discussion about authenticating server shutdown operations and general agreement that the current situation is confusing. DERBY-2109 intends to add credentials to the server shutdown command. I think that the same api should be used to specify username and password for all of our server commands--whatever that api turns out to be.

          6) I think it would be clearer if the url option were called slaveurl. Do we need a symmetric masterurl option for the startslave command? How does the slave know that it is receiving records from the correct master? What happens if two masters try to replicate to the same slave?

          7) Is the startmaster command restricted to a server running on the same machine as the master database? Similarly, is the startslave command restricted to a server on the slave database machine? What about failover and stop?

          8) I am confused about the startslave command. Does this create a new database? If so, how are the credentials enforced in the case that credentials are stored in the database? If not, what happens if there is already a database by that name? Is the database destroyed and replaced after authentication?

          9) If you have stopped replication, can you resume it later on?

          10) What is the sequence of these commands? Do you first issue a startmaster and then issue a startslave? What happens if the commands occur out of sequence? Similarly for

          11) It would be nice to understand how we insulate replication from man-in-the-middle attacks--even if we don't implement these protections in this first version.

          12) What happens if someone tries to connect to an active slave? What happens if someone tries to shutdown an active slave without first stopping replication at the master's end?

          13) What happens if the slave is shut down and then, later on, someone tries to boot the slave as an embedded database?

          Show
          Rick Hillegas added a comment - Thanks for rev 2 of the spec, Jørgen. Looks like you have addressed issue (1). I see in your comments above, that you are in agreement about how to address issue (2), but I don't see this reflected in the new spec itself. I'm getting the impression that the answer to (3) and (4) is that the first rev of replication won't handle these issues; instead, they will be addressed in a later rev. Is that right? I have some more comments: 5) A heads-up about the user/password options on the new server commands. There has been some discussion about authenticating server shutdown operations and general agreement that the current situation is confusing. DERBY-2109 intends to add credentials to the server shutdown command. I think that the same api should be used to specify username and password for all of our server commands--whatever that api turns out to be. 6) I think it would be clearer if the url option were called slaveurl. Do we need a symmetric masterurl option for the startslave command? How does the slave know that it is receiving records from the correct master? What happens if two masters try to replicate to the same slave? 7) Is the startmaster command restricted to a server running on the same machine as the master database? Similarly, is the startslave command restricted to a server on the slave database machine? What about failover and stop? 8) I am confused about the startslave command. Does this create a new database? If so, how are the credentials enforced in the case that credentials are stored in the database? If not, what happens if there is already a database by that name? Is the database destroyed and replaced after authentication? 9) If you have stopped replication, can you resume it later on? 10) What is the sequence of these commands? Do you first issue a startmaster and then issue a startslave? What happens if the commands occur out of sequence? Similarly for 11) It would be nice to understand how we insulate replication from man-in-the-middle attacks--even if we don't implement these protections in this first version. 12) What happens if someone tries to connect to an active slave? What happens if someone tries to shutdown an active slave without first stopping replication at the master's end? 13) What happens if the slave is shut down and then, later on, someone tries to boot the slave as an embedded database?
          Hide
          V.Narayanan added a comment -

          >Looks like you have addressed issue (1). I see in your comments above,
          >that you are in agreement about how to address issue (2), but I don't
          >see this reflected in the new spec itself

          Under the section Interacting with the replication feature the spec says
          how to Start Master, Start Slave, failover and stop replication.

          Below the table is mentioned the following

          "These commands apply only to the database specified, and
          only the database owner will be allowed to execute them."

          >I'm getting the impression that the answer to (3) and (4) is that the
          >first rev of replication won't handle these issues; instead, they will
          >be addressed in a later rev. Is that right?

          I found some answers from Jorgen's comment

          "Hence, the behavior is the same regardless of whether the "old slave" or "old master"
          is the instance that is still alive when one of the instances failed. In both cases,
          new replication must be started manually on the surviving Derby instance. So far, I
          have not considered restarting replication automatically after an instance has failed."

          I interpret it that a manual startup is planned for now. Is a auto startup on the cards?

          >A heads-up about the user/password options on the new server commands. There has been
          >some discussion about authenticating server shutdown operations and general agreement
          >that the current situation is confusing. DERBY-2109 intends to add credentials to the
          >server shutdown command. I think that the same api should be used to specify username
          >and password for all of our server commands--whatever that api turns out to be.

          Thank you for this pointer. I guess taking the same lines as 2109 is the thing to do here.

          >Do we need a symmetric masterurl option for the startslave command?
          >How does the slave know that it is receiving records from the correct master?

          The slave basically starts a server to receive records from the master
          and the master connects to it to send the records. Even if we were to use the
          url to ensure that the slave is receiving records from the correct master
          there could be two senders on the same machine url. Guess we wouldn't need
          the slaveurl.

          But we need to do

          java org.apache.derby.drda.NetworkServerControl replication -startslave
          -db=<dbname> -port=<port> -user=<name> -pass=<pass>

          for each database that want replicated. So we could as well mention the master
          url also. But I am not sure how we would use this.

          >What happens if two masters try to replicate to the same slave?

          I guess you mean what happens if they try to connect using the same
          slaveurl.

          This would be an issue I guess because the slave would assume
          both to be legitimate unless we send the database name each time.

          But what would happen if both use the same database also.

          Can this be eliminated by having a handshake phase before the actual
          log transfer occurs. So if the same url is being used for a second handshake
          we would reject this unless this is a reconnect attempt after the master has
          crashed.

          >Is the startmaster command restricted to a server running on the same
          >machine as the master database? Similarly, is the startslave command restricted to a
          >server on the slave database machine? What about failover and stop?

          I think issuing the startslave on a machine would just mean that
          the server is started on the machine that the NetworkServerControl class
          since we would depend on this to start the agent Jorgen has mentioned in
          his comments. I guess the same applies to the other commands. I concluded this
          also because in the proof of concept code attached the RMI code that tranafers the
          log records is called from inside FileLoggerPrimary which writes into the log on
          the disk as well as through the network. Wouldn't this class take care of the
          case when the server is not running on the same machine as the slave database for
          logging?

          >If you have stopped replication, can you resume it later on?

          If stopping replication means that we will not archive logs anymore
          I guess this will not be possible. If the logs are still archived we
          can transmit from the log after replication has been stopped and the slave
          can still redo from there and replication from continue. That is we should
          not call SYSCS_UTIL.SYSCS_DISABLE_LOG_ARCHIVE_MODE system procedure after
          stopping replication. Guess the user should be able to decide this.

          Does this mean that the stopping API has to be modified?

          >What is the sequence of these commands? Do you first issue a startmaster and then issue a startslave?

          Since the startslave starts a listener this should be done first before
          startmaster.

          >What happens if the commands occur out of sequence?

          Since we mention the slaveurl to the startmaster command this will fail
          saying that the slave was not found at the url mentioned.

          >It would be nice to understand how we insulate replication from man-in-the-middle attacks--
          >even if we don't implement these protections in this first version.

          I guess you want the interfaces to be designed in such a way that will enable
          security to be plugged in at a later time. I think this is a very good suggestion.

          >What happens if someone tries to connect to an active slave? What happens if someone
          >tries to shutdown an active slave without first stopping replication at the master's end?

          A connect attempt from the master would fail and the master would report that the
          connection has been terminated due to the slave not being able to be reached or that a
          slave could not be found. Would this case be different from trying to connect to a
          Derby NetworkServer when it has been shutdown?

          >What happens if the slave is shut down and then, later on, someone tries to boot the slave
          >as an embedded database?

          Should this be similar to creating a database using the NetworkServer shutting it down
          and later trying to connect to it in the embedded mode?

          Show
          V.Narayanan added a comment - >Looks like you have addressed issue (1). I see in your comments above, >that you are in agreement about how to address issue (2), but I don't >see this reflected in the new spec itself Under the section Interacting with the replication feature the spec says how to Start Master, Start Slave, failover and stop replication. Below the table is mentioned the following "These commands apply only to the database specified, and only the database owner will be allowed to execute them." >I'm getting the impression that the answer to (3) and (4) is that the >first rev of replication won't handle these issues; instead, they will >be addressed in a later rev. Is that right? I found some answers from Jorgen's comment "Hence, the behavior is the same regardless of whether the "old slave" or "old master" is the instance that is still alive when one of the instances failed. In both cases, new replication must be started manually on the surviving Derby instance. So far, I have not considered restarting replication automatically after an instance has failed." I interpret it that a manual startup is planned for now. Is a auto startup on the cards? >A heads-up about the user/password options on the new server commands. There has been >some discussion about authenticating server shutdown operations and general agreement >that the current situation is confusing. DERBY-2109 intends to add credentials to the >server shutdown command. I think that the same api should be used to specify username >and password for all of our server commands--whatever that api turns out to be. Thank you for this pointer. I guess taking the same lines as 2109 is the thing to do here. >Do we need a symmetric masterurl option for the startslave command? >How does the slave know that it is receiving records from the correct master? The slave basically starts a server to receive records from the master and the master connects to it to send the records. Even if we were to use the url to ensure that the slave is receiving records from the correct master there could be two senders on the same machine url. Guess we wouldn't need the slaveurl. But we need to do java org.apache.derby.drda.NetworkServerControl replication -startslave -db=<dbname> -port=<port> -user=<name> -pass=<pass> for each database that want replicated. So we could as well mention the master url also. But I am not sure how we would use this. >What happens if two masters try to replicate to the same slave? I guess you mean what happens if they try to connect using the same slaveurl. This would be an issue I guess because the slave would assume both to be legitimate unless we send the database name each time. But what would happen if both use the same database also. Can this be eliminated by having a handshake phase before the actual log transfer occurs. So if the same url is being used for a second handshake we would reject this unless this is a reconnect attempt after the master has crashed. >Is the startmaster command restricted to a server running on the same >machine as the master database? Similarly, is the startslave command restricted to a >server on the slave database machine? What about failover and stop? I think issuing the startslave on a machine would just mean that the server is started on the machine that the NetworkServerControl class since we would depend on this to start the agent Jorgen has mentioned in his comments. I guess the same applies to the other commands. I concluded this also because in the proof of concept code attached the RMI code that tranafers the log records is called from inside FileLoggerPrimary which writes into the log on the disk as well as through the network. Wouldn't this class take care of the case when the server is not running on the same machine as the slave database for logging? >If you have stopped replication, can you resume it later on? If stopping replication means that we will not archive logs anymore I guess this will not be possible. If the logs are still archived we can transmit from the log after replication has been stopped and the slave can still redo from there and replication from continue. That is we should not call SYSCS_UTIL.SYSCS_DISABLE_LOG_ARCHIVE_MODE system procedure after stopping replication. Guess the user should be able to decide this. Does this mean that the stopping API has to be modified? >What is the sequence of these commands? Do you first issue a startmaster and then issue a startslave? Since the startslave starts a listener this should be done first before startmaster. >What happens if the commands occur out of sequence? Since we mention the slaveurl to the startmaster command this will fail saying that the slave was not found at the url mentioned. >It would be nice to understand how we insulate replication from man-in-the-middle attacks-- >even if we don't implement these protections in this first version. I guess you want the interfaces to be designed in such a way that will enable security to be plugged in at a later time. I think this is a very good suggestion. >What happens if someone tries to connect to an active slave? What happens if someone >tries to shutdown an active slave without first stopping replication at the master's end? A connect attempt from the master would fail and the master would report that the connection has been terminated due to the slave not being able to be reached or that a slave could not be found. Would this case be different from trying to connect to a Derby NetworkServer when it has been shutdown? >What happens if the slave is shut down and then, later on, someone tries to boot the slave >as an embedded database? Should this be similar to creating a database using the NetworkServer shutting it down and later trying to connect to it in the embedded mode?
          Hide
          Jørgen Løland added a comment -

          Thank you for the extensive comments from both Rick and Narayanan. I have a few supplementary comments to those from Narayanan.

          >>Looks like you have addressed issue (1). I see in your comments above, that you are in agreement about how to address issue (2), but I don't see this reflected in the new spec itself. I'm getting the impression that the answer to (3) and (4) is that the first rev of replication won't handle these issues; instead, they will be addressed in a later rev. Is that right?
          >I interpret it that a manual startup is planned for now. Is a auto startup on the cards?

          Re 2: It says so below the table of NetworkServerControl commands, but I will make it clearer in the next version of the spec.
          Re 3 and 4: That's correct; in the first rev, there will be no automatic restart of replication when one of the instances have failed. The DB owner will have to manually restart replication. A later improvement may automate this step; this is a good candidate for extending the functionality later.

          >>5) A heads-up about the user/password options on the new server commands. There has been some discussion about authenticating server shutdown operations and general agreement that the current situation is confusing. DERBY-2109 intends to add credentials to the server shutdown command. I think that the same api should be used to specify username and password for all of our server commands--whatever that api turns out to be.
          >Thank you for this pointer. I guess taking the same lines as 2109 is the thing to do here.

          I agree. There is no reason why authentication for replication should differ from other commands. The NetworkServerControl commands I wrote in the func spec show what information is needed. I will modify the next version of the func spec to state that authentication is needed, and should be performed in the same manner as other NetworkServerControl commands.

          >>6) I think it would be clearer if the url option were called slaveurl. Do we need a symmetric masterurl option for the startslave command? How does the slave know that it is receiving records from the correct master? What happens if two masters try to replicate to the same slave?

          >This would be an issue I guess because the slave would assume both to be legitimate unless we send the database name each time.
          >But what would happen if both use the same database also.
          >Can this be eliminated by having a handshake phase before the actual log transfer occurs. So if the same url is being used for a second handshake we would reject this unless this is a reconnect attempt after the master has
          crashed.

          We should only allow one connection to a slave database. A handshake sounds like a good idea.

          >>7) Is the startmaster command restricted to a server running on the same machine as the master database? Similarly, is the startslave command restricted to a server on the slave database machine? What about failover and stop?

          I think the start and failover commands needs to be restricted to the same machine as the database resides, but this depends on the NetworkServerControl security. Again, this should be equal to the policy for other NetworkServerControl commands. See 12) for how to stop replication.

          >>8) I am confused about the startslave command. Does this create a new database? If so, how are the credentials enforced in the case that credentials are stored in the database? If not, what happens if there is already a database by that name? Is the database destroyed and replaced after authentication?

          Since this has not been implemented yet, the solution may have to change later. However, the current intention is that the first thing that happens on the slave is that it receives the database 'x' from the master. When 'x' has been received, the slave starts the boot process of 'x'. So, the slave does not create 'x', even though it did not exist on the slave when the startslave command was issued.

          We will have to check that a database with the same name does not exist on the slave. Furthermore, we should probably ensure that the owner of 'x' is allowed to create a database on the slave. Did you think of any other permissions we should check for? Maybe a allowedToReplicate credential would be needed?

          >>9) If you have stopped replication, can you resume it later on?
          >If stopping replication means that we will not archive logs anymore I guess this will not be possible. If the logs are still archived we can transmit from the log after replication has been stopped and the slave can still redo from there and replication from continue. That is we should not call SYSCS_UTIL.SYSCS_DISABLE_LOG_ARCHIVE_MODE system procedure after stopping replication. Guess the user should be able to decide this.

          I am not sure about this. If a failover was performed, the answer is definately 'no' because the repliaction method assumes that the physical layout of the databases are equal. A failover will not preserve this exactly equal physical layout since the failover process will undo uncommitted transactions. If the replication was simply turned off, Narayanans suggestion of starting log shipment from some defined log record will probably work.

          However, I think we have to be restrictive in the first version of the functionality. For now, I think the answer will be 'no', i.e., you have to restart replication by first deleting the database (on the slave), and then send the entire database to the slave. Resuming replication makes a good candidate for extending the functionality.

          >>10) What is the sequence of these commands? Do you first issue a startmaster and then issue a startslave? What happens if the commands occur out of sequence? Similarly for
          >Since the startslave starts a listener this should be done first before startmaster.

          It is correct that the slave will be listening for the master and therefore must be started before replication can start. However, I see no reason why the connection attempts should not be retried every now and then until the slave is ready to accept the connection.

          Hence, I don't think we need a defined sequence of commands. When the slave starts, it does nothing until a master connects to it (except write some messages to derby.log). When the master is started, it continues as normal (also writes some messages to derby.log) until it is able to get a connection to the slave.

          >>11) It would be nice to understand how we insulate replication from man-in-the-middle attacks--even if we don't implement these protections in this first version.

          That is a good point. It would, e.g., be possible to use a signature. The slave could send a hashed username to the master, and the master could respond by sending the hashed password. It should not be possible to "unhash" the username/password. But I am no security expert, hence input on this issue is appreciated. And you are right; this will not be handled in the first version.

          >>12) What happens if someone tries to connect to an active slave? What happens if someone tries to shutdown an active slave without first stopping replication at the master's end?

          If someone tries to connect to a db 'x' that has the slave role in derby instance 'i', the connection is refused. Note that the derby instance 'i' may manage other databases at the same time. Making a connection to these other databases is unaffected by replication.

          >A connect attempt from the master would fail and the master would report that the connection has been terminated due to the slave not being able to be reached or that a slave could not be found. Would this case be different from trying to connect to a Derby NetworkServer when it has been shutdown?

          The initial plan was to allow shutdown at both ends. Now that you mention it, however, stopping replication from the master seems to be more clean. Hence, I think the revised plan should be as follows: Stopping replication will be performed by issuing the stopreplication command at the master. The master then sends a stop replication message over the network connection to the slave.

          >>13) What happens if the slave is shut down and then, later on, someone tries to boot the slave as an embedded database?

          That will be allowed. In this case, the database will then boot to a transaction consistent state that includes all transactions that were committed (and sent, of course) before the shutdown.

          Show
          Jørgen Løland added a comment - Thank you for the extensive comments from both Rick and Narayanan. I have a few supplementary comments to those from Narayanan. >>Looks like you have addressed issue (1). I see in your comments above, that you are in agreement about how to address issue (2), but I don't see this reflected in the new spec itself. I'm getting the impression that the answer to (3) and (4) is that the first rev of replication won't handle these issues; instead, they will be addressed in a later rev. Is that right? >I interpret it that a manual startup is planned for now. Is a auto startup on the cards? Re 2: It says so below the table of NetworkServerControl commands, but I will make it clearer in the next version of the spec. Re 3 and 4: That's correct; in the first rev, there will be no automatic restart of replication when one of the instances have failed. The DB owner will have to manually restart replication. A later improvement may automate this step; this is a good candidate for extending the functionality later. >>5) A heads-up about the user/password options on the new server commands. There has been some discussion about authenticating server shutdown operations and general agreement that the current situation is confusing. DERBY-2109 intends to add credentials to the server shutdown command. I think that the same api should be used to specify username and password for all of our server commands--whatever that api turns out to be. >Thank you for this pointer. I guess taking the same lines as 2109 is the thing to do here. I agree. There is no reason why authentication for replication should differ from other commands. The NetworkServerControl commands I wrote in the func spec show what information is needed. I will modify the next version of the func spec to state that authentication is needed, and should be performed in the same manner as other NetworkServerControl commands. >>6) I think it would be clearer if the url option were called slaveurl. Do we need a symmetric masterurl option for the startslave command? How does the slave know that it is receiving records from the correct master? What happens if two masters try to replicate to the same slave? >This would be an issue I guess because the slave would assume both to be legitimate unless we send the database name each time. >But what would happen if both use the same database also. >Can this be eliminated by having a handshake phase before the actual log transfer occurs. So if the same url is being used for a second handshake we would reject this unless this is a reconnect attempt after the master has crashed. We should only allow one connection to a slave database. A handshake sounds like a good idea. >>7) Is the startmaster command restricted to a server running on the same machine as the master database? Similarly, is the startslave command restricted to a server on the slave database machine? What about failover and stop? I think the start and failover commands needs to be restricted to the same machine as the database resides, but this depends on the NetworkServerControl security. Again, this should be equal to the policy for other NetworkServerControl commands. See 12) for how to stop replication. >>8) I am confused about the startslave command. Does this create a new database? If so, how are the credentials enforced in the case that credentials are stored in the database? If not, what happens if there is already a database by that name? Is the database destroyed and replaced after authentication? Since this has not been implemented yet, the solution may have to change later. However, the current intention is that the first thing that happens on the slave is that it receives the database 'x' from the master. When 'x' has been received, the slave starts the boot process of 'x'. So, the slave does not create 'x', even though it did not exist on the slave when the startslave command was issued. We will have to check that a database with the same name does not exist on the slave. Furthermore, we should probably ensure that the owner of 'x' is allowed to create a database on the slave. Did you think of any other permissions we should check for? Maybe a allowedToReplicate credential would be needed? >>9) If you have stopped replication, can you resume it later on? >If stopping replication means that we will not archive logs anymore I guess this will not be possible. If the logs are still archived we can transmit from the log after replication has been stopped and the slave can still redo from there and replication from continue. That is we should not call SYSCS_UTIL.SYSCS_DISABLE_LOG_ARCHIVE_MODE system procedure after stopping replication. Guess the user should be able to decide this. I am not sure about this. If a failover was performed, the answer is definately 'no' because the repliaction method assumes that the physical layout of the databases are equal. A failover will not preserve this exactly equal physical layout since the failover process will undo uncommitted transactions. If the replication was simply turned off, Narayanans suggestion of starting log shipment from some defined log record will probably work. However, I think we have to be restrictive in the first version of the functionality. For now, I think the answer will be 'no', i.e., you have to restart replication by first deleting the database (on the slave), and then send the entire database to the slave. Resuming replication makes a good candidate for extending the functionality. >>10) What is the sequence of these commands? Do you first issue a startmaster and then issue a startslave? What happens if the commands occur out of sequence? Similarly for >Since the startslave starts a listener this should be done first before startmaster. It is correct that the slave will be listening for the master and therefore must be started before replication can start. However, I see no reason why the connection attempts should not be retried every now and then until the slave is ready to accept the connection. Hence, I don't think we need a defined sequence of commands. When the slave starts, it does nothing until a master connects to it (except write some messages to derby.log). When the master is started, it continues as normal (also writes some messages to derby.log) until it is able to get a connection to the slave. >>11) It would be nice to understand how we insulate replication from man-in-the-middle attacks--even if we don't implement these protections in this first version. That is a good point. It would, e.g., be possible to use a signature. The slave could send a hashed username to the master, and the master could respond by sending the hashed password. It should not be possible to "unhash" the username/password. But I am no security expert, hence input on this issue is appreciated. And you are right; this will not be handled in the first version. >>12) What happens if someone tries to connect to an active slave? What happens if someone tries to shutdown an active slave without first stopping replication at the master's end? If someone tries to connect to a db 'x' that has the slave role in derby instance 'i', the connection is refused. Note that the derby instance 'i' may manage other databases at the same time. Making a connection to these other databases is unaffected by replication. >A connect attempt from the master would fail and the master would report that the connection has been terminated due to the slave not being able to be reached or that a slave could not be found. Would this case be different from trying to connect to a Derby NetworkServer when it has been shutdown? The initial plan was to allow shutdown at both ends. Now that you mention it, however, stopping replication from the master seems to be more clean. Hence, I think the revised plan should be as follows: Stopping replication will be performed by issuing the stopreplication command at the master. The master then sends a stop replication message over the network connection to the slave. >>13) What happens if the slave is shut down and then, later on, someone tries to boot the slave as an embedded database? That will be allowed. In this case, the database will then boot to a transaction consistent state that includes all transactions that were committed (and sent, of course) before the shutdown.
          Hide
          Rick Hillegas added a comment -

          Thanks for the responses, Jørgen and Narayanan. Some more comments follow:

          About (2): Thanks, now that you point out the line, I can see it. I don't think further clarification is needed, just some formatting. The problem for me is that the text flanking the table isn't enclosed in paragraph tags and this causes my browser to crush the text against the bottom of the table, where I failed to see it. If you just bracket the two chunks of text with <p> and </p> then that solves the problem.

          I'm going to be giving a lightning talk at OSCON on the state of Derby and I want to know what if anything I can say about this feature. It sounds as though some increment of functionality will be delivered in the 10.4 timeframe. Will this functionality be complete enough that someone can use it? If so, can you characterize the kind of application that will benefit from the 10.4 increment?

          Show
          Rick Hillegas added a comment - Thanks for the responses, Jørgen and Narayanan. Some more comments follow: About (2): Thanks, now that you point out the line, I can see it. I don't think further clarification is needed, just some formatting. The problem for me is that the text flanking the table isn't enclosed in paragraph tags and this causes my browser to crush the text against the bottom of the table, where I failed to see it. If you just bracket the two chunks of text with <p> and </p> then that solves the problem. I'm going to be giving a lightning talk at OSCON on the state of Derby and I want to know what if anything I can say about this feature. It sounds as though some increment of functionality will be delivered in the 10.4 timeframe. Will this functionality be complete enough that someone can use it? If so, can you characterize the kind of application that will benefit from the 10.4 increment?
          Hide
          Jørgen Løland added a comment -

          Attached new functional specification that incorporate the last few days of comments from Rick and Narayanan.

          Show
          Jørgen Løland added a comment - Attached new functional specification that incorporate the last few days of comments from Rick and Narayanan.
          Hide
          Jørgen Løland added a comment -

          >I'm going to be giving a lightning talk at OSCON on the state of Derby and I want to know what if anything I can say about this feature. It sounds as though some increment of functionality will be delivered in the 10.4 timeframe. Will this functionality be complete enough that someone can use it? If so, can you characterize the kind of application that will benefit from the 10.4 increment?

          The intention is definitely to have working replication functionality in 10.4. Exactly what will be implemented and how depends heavily on what the community wants, so the guesses below may have to be changed. Also, if more people want to contribute to this task, more functionality will make it into 10.4

          I think it is very likely that in 10.4, replication will work. Some manual steps will probably be required, e.g. to start the fail-over logic. The replication functionality will be usable by all applications that want higher availability than a single (point of failure) server can provide. However, having to manually start fail-over may be unacceptable (take too much time) for applications with the highest availability requirements. Furthermore, the application should not contain super-secret data since security issues are likely to not make it into this release. But again, the priorities may change as community discussions progress.

          Show
          Jørgen Løland added a comment - >I'm going to be giving a lightning talk at OSCON on the state of Derby and I want to know what if anything I can say about this feature. It sounds as though some increment of functionality will be delivered in the 10.4 timeframe. Will this functionality be complete enough that someone can use it? If so, can you characterize the kind of application that will benefit from the 10.4 increment? The intention is definitely to have working replication functionality in 10.4. Exactly what will be implemented and how depends heavily on what the community wants, so the guesses below may have to be changed. Also, if more people want to contribute to this task, more functionality will make it into 10.4 I think it is very likely that in 10.4, replication will work. Some manual steps will probably be required, e.g. to start the fail-over logic. The replication functionality will be usable by all applications that want higher availability than a single (point of failure) server can provide. However, having to manually start fail-over may be unacceptable (take too much time) for applications with the highest availability requirements. Furthermore, the application should not contain super-secret data since security issues are likely to not make it into this release. But again, the priorities may change as community discussions progress.
          Hide
          Dag H. Wanvik added a comment -

          Great that you guys are running with this one! Some comments to the
          functional specification:

          • Derby doesn't log all operations by default, e.g. bulk
            import, deleting all records from a table, creating an index. These
            issues were addressed in the work on online backup (DERBY-239),
            partially by denying online backup if non-logged operations are not
            yet committed, and partly by making them do logging when online
            backup is in effect (the reason for not logging some operations is
            performance). I guess for replication you would need to make them
            do logging for the duration.
          • Overview of characteristics: you mention the network line as a
            single point of failure, that's fine in a first version. One could
            imagine having the replication service support more network interfaces
            to alleviate this vulnerability.
          • Fail-over: When fail-over is performed (with a command on the
            slave), I assume will the master be told to stop its replication so
            it can tidy up? Since you describe the semantics of the stop
            replication command as shutting down the slave database, I assume
            failover can be performed without requiring a prior stop replication
            command on the master.

          If reaching master is not possible (lost connection), can the stop
          replication command be used against the master to tidy up even when
          connection has been lost?

          Perhaps it would be good to include in your table of commands any
          preconditions for the commands. BTW, It seems a good idea to not
          impose an order on starting server or slave first.

          • Presumably, the master will time out if it is unable to send logs to
            the slave for some (configurable?) period. It could keep trying for
            some time but eventually it would need to stop or overflow the
            buffer mechanism you suggest in DERBY-2926. Will you require that
            the user has LOG_ARCHIVE_MODE enabled? If you do, it would seem a
            nice addition to later be able to resume replication even if the
            buffer had to be abandoned (as you suggest). Before the buffer is
            abandoned, if the network becomes available again, it would be
            trivial to resume shipping of logs I expect?
          • Given that the system privileges work of DERBY-2109 provides us with
            necessary security, I would hope we can lift the restriction that
            administration commands can only be run from the same machine as the
            server is started on, but for the time being the restriction makes
            sense.
          • You describe the replication commands as CLI commands against
            NetworkServerControl; will you be making the commands available in
            API form as well, so replication can be embedded in an application?
          • typos: "enclypted", "it's local log"
          Show
          Dag H. Wanvik added a comment - Great that you guys are running with this one! Some comments to the functional specification: Derby doesn't log all operations by default, e.g. bulk import, deleting all records from a table, creating an index. These issues were addressed in the work on online backup ( DERBY-239 ), partially by denying online backup if non-logged operations are not yet committed, and partly by making them do logging when online backup is in effect (the reason for not logging some operations is performance). I guess for replication you would need to make them do logging for the duration. Overview of characteristics: you mention the network line as a single point of failure, that's fine in a first version. One could imagine having the replication service support more network interfaces to alleviate this vulnerability. Fail-over: When fail-over is performed (with a command on the slave), I assume will the master be told to stop its replication so it can tidy up? Since you describe the semantics of the stop replication command as shutting down the slave database, I assume failover can be performed without requiring a prior stop replication command on the master. If reaching master is not possible (lost connection), can the stop replication command be used against the master to tidy up even when connection has been lost? Perhaps it would be good to include in your table of commands any preconditions for the commands. BTW, It seems a good idea to not impose an order on starting server or slave first. Presumably, the master will time out if it is unable to send logs to the slave for some (configurable?) period. It could keep trying for some time but eventually it would need to stop or overflow the buffer mechanism you suggest in DERBY-2926 . Will you require that the user has LOG_ARCHIVE_MODE enabled? If you do, it would seem a nice addition to later be able to resume replication even if the buffer had to be abandoned (as you suggest). Before the buffer is abandoned, if the network becomes available again, it would be trivial to resume shipping of logs I expect? Given that the system privileges work of DERBY-2109 provides us with necessary security, I would hope we can lift the restriction that administration commands can only be run from the same machine as the server is started on, but for the time being the restriction makes sense. You describe the replication commands as CLI commands against NetworkServerControl; will you be making the commands available in API form as well, so replication can be embedded in an application? typos: "enclypted", "it's local log"
          Hide
          V.Narayanan added a comment -

          >Great that you guys are running with this one! Some comments to the
          >functional specification:

          Thank you for the reviews and comments Dag.

          >* Derby doesn't log all operations by default, e.g. bulk
          > import, deleting all records from a table, creating an index. These
          > issues were addressed in the work on online backup (DERBY-239),
          > partially by denying online backup if non-logged operations are not
          > yet committed, and partly by making them do logging when online
          > backup is in effect (the reason for not logging some operations is
          > performance). I guess for replication you would need to make them
          > do logging for the duration.

          I agree you are very correct.

          I thought I could get some idea as to how this could be done by going through
          the patches Derby-239.

          I read through Derby-239 and found patches
          onlinebackup(3&7).diff to be of interest to us.

          The primary motivation of 3 was the following

          "To make a consistent online backup in this scenario, this patch:

          1) blocks online backup until all the transactions with unlogged operation are
          committed/aborted.
          2) implicitly converts all unlogged operations to logged mode for the duration
          of the online backup, if they are started when backup is in progress. "

          7 addressed comments on 3.

          >* Overview of characteristics: you mention the network line as a
          > single point of failure, that's fine in a first version. One could
          > imagine having the replication service support more network interfaces
          > to alleviate this vulnerability.

          I agree!

          Some random thoughts on the modifications that would be required.

          when the master or slave has multiple network interfaces each of them
          can be accessed using different IP Addresses to which the network interfaces
          would be bound.

          • The start replication command on the master and slave
            will have to be modified to accept the multiple IP addresses
            of the peer.
          • The log sender should be capable of detecting failure to send
            to one and switch to sending to the other.
          • The log receiver should be modified to be able to listen at both
            the interfaces.

          >* Fail-over: When fail-over is performed (with a command on the
          > slave), I assume will the master be told to stop its replication so
          > it can tidy up? Since you describe the semantics of the stop
          > replication command as shutting down the slave database, I assume
          > failover can be performed without requiring a prior stop replication
          > command on the master.

          you are correct. We would not issue a stop on the master to complete a
          fail-over.

          > If reaching master is not possible (lost connection), can the stop
          > replication command be used against the master to tidy up even when
          > connection has been lost?

          Not being able to reach the master would mean that the replication process
          should automatically stop. The sender should quit trying to send logs.
          But this case would arise when you try the stop command between the time the
          master tries to connect to the slave and the master automatically stops
          replication.

          In this case a stop command should close down all replication behaviour,
          independently of whether a connection is actually established. the only
          difference is that if the connection is ok, the slave is shut down as well

          > Perhaps it would be good to include in your table of commands any
          > preconditions for the commands. BTW, It seems a good idea to not
          > impose an order on starting server or slave first.

          Will add a column preconditions for each command and will fill-up the
          information for them. I will submit a v4 of the func spec for this.

          >* Presumably, the master will time out if it is unable to send logs to
          > the slave for some (configurable?) period. It could keep trying for
          > some time but eventually it would need to stop or overflow the
          > buffer mechanism you suggest in DERBY-2926. Will you require that
          > the user has LOG_ARCHIVE_MODE enabled? If you do, it would seem a
          > nice addition to later be able to resume replication even if the
          > buffer had to be abandoned (as you suggest). Before the buffer is
          > abandoned, if the network becomes available again, it would be
          > trivial to resume shipping of logs I expect?

          I agree this could be a great addition, because if the buffer overflow did result
          we could resume sending logs from the backed up logs.

          >* Given that the system privileges work of DERBY-2109 provides us with
          > necessary security, I would hope we can lift the restriction that
          > administration commands can only be run from the same machine as the
          > server is started on, but for the time being the restriction makes
          > sense.

          I agree.

          >* You describe the replication commands as CLI commands against
          > NetworkServerControl; will you be making the commands available in
          > API form as well, so replication can be embedded in an application?

          The slave is never booted during the time it is receiving logs. Hence
          we would not be able to actually use a stored procedure here. This was
          the reason we had earlier decided on a CLI command agains NetworkServerControl.

          However if by public API you mean public methods in NetworkServerControl that
          can be reached from outside if anyone wants to code an admin program at a later
          stage, I think this is a great idea and can be easily done.

          >* typos: "enclypted", "it's local log"

          Will fix this in the next version.

          Show
          V.Narayanan added a comment - >Great that you guys are running with this one! Some comments to the >functional specification: Thank you for the reviews and comments Dag. >* Derby doesn't log all operations by default, e.g. bulk > import, deleting all records from a table, creating an index. These > issues were addressed in the work on online backup ( DERBY-239 ), > partially by denying online backup if non-logged operations are not > yet committed, and partly by making them do logging when online > backup is in effect (the reason for not logging some operations is > performance). I guess for replication you would need to make them > do logging for the duration. I agree you are very correct. I thought I could get some idea as to how this could be done by going through the patches Derby-239. I read through Derby-239 and found patches onlinebackup(3&7).diff to be of interest to us. The primary motivation of 3 was the following "To make a consistent online backup in this scenario, this patch: 1) blocks online backup until all the transactions with unlogged operation are committed/aborted. 2) implicitly converts all unlogged operations to logged mode for the duration of the online backup, if they are started when backup is in progress. " 7 addressed comments on 3. >* Overview of characteristics: you mention the network line as a > single point of failure, that's fine in a first version. One could > imagine having the replication service support more network interfaces > to alleviate this vulnerability. I agree! Some random thoughts on the modifications that would be required. when the master or slave has multiple network interfaces each of them can be accessed using different IP Addresses to which the network interfaces would be bound. The start replication command on the master and slave will have to be modified to accept the multiple IP addresses of the peer. The log sender should be capable of detecting failure to send to one and switch to sending to the other. The log receiver should be modified to be able to listen at both the interfaces. >* Fail-over: When fail-over is performed (with a command on the > slave), I assume will the master be told to stop its replication so > it can tidy up? Since you describe the semantics of the stop > replication command as shutting down the slave database, I assume > failover can be performed without requiring a prior stop replication > command on the master. you are correct. We would not issue a stop on the master to complete a fail-over. > If reaching master is not possible (lost connection), can the stop > replication command be used against the master to tidy up even when > connection has been lost? Not being able to reach the master would mean that the replication process should automatically stop. The sender should quit trying to send logs. But this case would arise when you try the stop command between the time the master tries to connect to the slave and the master automatically stops replication. In this case a stop command should close down all replication behaviour, independently of whether a connection is actually established. the only difference is that if the connection is ok, the slave is shut down as well > Perhaps it would be good to include in your table of commands any > preconditions for the commands. BTW, It seems a good idea to not > impose an order on starting server or slave first. Will add a column preconditions for each command and will fill-up the information for them. I will submit a v4 of the func spec for this. >* Presumably, the master will time out if it is unable to send logs to > the slave for some (configurable?) period. It could keep trying for > some time but eventually it would need to stop or overflow the > buffer mechanism you suggest in DERBY-2926 . Will you require that > the user has LOG_ARCHIVE_MODE enabled? If you do, it would seem a > nice addition to later be able to resume replication even if the > buffer had to be abandoned (as you suggest). Before the buffer is > abandoned, if the network becomes available again, it would be > trivial to resume shipping of logs I expect? I agree this could be a great addition, because if the buffer overflow did result we could resume sending logs from the backed up logs. >* Given that the system privileges work of DERBY-2109 provides us with > necessary security, I would hope we can lift the restriction that > administration commands can only be run from the same machine as the > server is started on, but for the time being the restriction makes > sense. I agree. >* You describe the replication commands as CLI commands against > NetworkServerControl; will you be making the commands available in > API form as well, so replication can be embedded in an application? The slave is never booted during the time it is receiving logs. Hence we would not be able to actually use a stored procedure here. This was the reason we had earlier decided on a CLI command agains NetworkServerControl. However if by public API you mean public methods in NetworkServerControl that can be reached from outside if anyone wants to code an admin program at a later stage, I think this is a great idea and can be easily done. >* typos: "enclypted", "it's local log" Will fix this in the next version.
          Hide
          Jørgen Løland added a comment -

          >>* Derby doesn't log all operations by default, e.g. bulk
          >> import, deleting all records from a table, creating an index.
          >
          >"To make a consistent online backup in this scenario, this patch:
          >1) blocks online backup until all the transactions with unlogged operation are
          > committed/aborted.
          >2) implicitly converts all unlogged operations to logged mode for the duration
          > of the online backup, if they are started when backup is in progress. "

          Thanks for investigating this Narayanan. Just to be sure: does
          this mean that all operations are logged if we turn the
          DERBY-239 mechanism on?

          >>* Presumably, the master will time out if it is unable to send logs to
          >> the slave for some (configurable?) period. It could keep trying for
          >> some time but eventually it would need to stop or overflow the
          >> buffer mechanism you suggest in DERBY-2926. Will you require that
          >> the user has LOG_ARCHIVE_MODE enabled? If you do, it would seem a
          >> nice addition to later be able to resume replication even if the
          >> buffer had to be abandoned (as you suggest). Before the buffer is
          >> abandoned, if the network becomes available again, it would be
          >> trivial to resume shipping of logs I expect?

          I think this functionality will have to be added as a later
          extension to replication. When the issue is addressed, I do not
          think we have to run with LOG_ARCHIVE_MODE enabled until a buffer
          overflow occurs, however. When an overflow happens, the database should
          be frozen, the existing replication buffer content written to
          disk, and LOG_ARCHIVE_MODE be started. The database can then be
          unfrozen. This way, we do not need to store log files are written when
          replication works as it should.

          I answered a related question in DERBY-2926; I'll just cut'n paste the
          answer here for easy access:
          ---8<---
          >How do you imagine flow control if the network gets slow? Would you
          >block a transaction whose record would overflow the buffer?

          There are at least two simple alternatives for how to handle a
          full replication buffer:

          • Stop replication
          • Block transactions

          I think the first alternative would be the better one since
          blocking transactions would mean no availability. This is the
          exact opposite of what we want to achieve with replication.

          The functional spec of DERBY-2872 states that resuming
          replication after it has been stopped is a good candidate for
          extending the functionality. Once that issue has been addressed,
          we have a third alternative if the buffer gets full:

          • Stop replication for now, but store the log files so that
            replication can be resumed later.
            --->8---
          Show
          Jørgen Løland added a comment - >>* Derby doesn't log all operations by default, e.g. bulk >> import, deleting all records from a table, creating an index. > >"To make a consistent online backup in this scenario, this patch: >1) blocks online backup until all the transactions with unlogged operation are > committed/aborted. >2) implicitly converts all unlogged operations to logged mode for the duration > of the online backup, if they are started when backup is in progress. " Thanks for investigating this Narayanan. Just to be sure: does this mean that all operations are logged if we turn the DERBY-239 mechanism on? >>* Presumably, the master will time out if it is unable to send logs to >> the slave for some (configurable?) period. It could keep trying for >> some time but eventually it would need to stop or overflow the >> buffer mechanism you suggest in DERBY-2926 . Will you require that >> the user has LOG_ARCHIVE_MODE enabled? If you do, it would seem a >> nice addition to later be able to resume replication even if the >> buffer had to be abandoned (as you suggest). Before the buffer is >> abandoned, if the network becomes available again, it would be >> trivial to resume shipping of logs I expect? I think this functionality will have to be added as a later extension to replication. When the issue is addressed, I do not think we have to run with LOG_ARCHIVE_MODE enabled until a buffer overflow occurs, however. When an overflow happens, the database should be frozen, the existing replication buffer content written to disk, and LOG_ARCHIVE_MODE be started. The database can then be unfrozen. This way, we do not need to store log files are written when replication works as it should. I answered a related question in DERBY-2926 ; I'll just cut'n paste the answer here for easy access: --- 8< --- >How do you imagine flow control if the network gets slow? Would you >block a transaction whose record would overflow the buffer? There are at least two simple alternatives for how to handle a full replication buffer: Stop replication Block transactions I think the first alternative would be the better one since blocking transactions would mean no availability. This is the exact opposite of what we want to achieve with replication. The functional spec of DERBY-2872 states that resuming replication after it has been stopped is a good candidate for extending the functionality. Once that issue has been addressed, we have a third alternative if the buffer gets full: Stop replication for now, but store the log files so that replication can be resumed later. --- >8 ---
          Hide
          V.Narayanan added a comment -

          >Thanks for investigating this Narayanan. Just to be sure: does
          >this mean that all operations are logged if we turn the
          >DERBY-239 mechanism on?

          I guess you mean if we do the same thing being done in
          DERBY-239 would it log all operations. Seems like it at first
          look. I however will dig deeper here and revert back.

          Show
          V.Narayanan added a comment - >Thanks for investigating this Narayanan. Just to be sure: does >this mean that all operations are logged if we turn the > DERBY-239 mechanism on? I guess you mean if we do the same thing being done in DERBY-239 would it log all operations. Seems like it at first look. I however will dig deeper here and revert back.
          Hide
          Øystein Grøvlen added a comment -

          I think the functional spec looks very good. A few minor comments:

          1. You say, "the database is only booted, not created, on the slave".
          I think I understand how things will be done, but the booting part
          is a bit confusing.

          2. I guess you will add a connection attribute that is used to get the
          a database booted in "slave mode". I think this should be
          specified in the functional spec. Even if people will normally not
          use that directly, it will be part of the public API, and I think
          it should be specified in a funcspec.

          3. You say, "The master can now continue to process transaction", but
          you do not say anything about when such processing is stopped.
          What advatange does pausing the transaction processing give?

          4. What will happen if the failover command is executed while the
          master is alive and doing replication?

          5. I have already commented on the command syntax in DERBY-2954.

          Show
          Øystein Grøvlen added a comment - I think the functional spec looks very good. A few minor comments: 1. You say, "the database is only booted, not created, on the slave". I think I understand how things will be done, but the booting part is a bit confusing. 2. I guess you will add a connection attribute that is used to get the a database booted in "slave mode". I think this should be specified in the functional spec. Even if people will normally not use that directly, it will be part of the public API, and I think it should be specified in a funcspec. 3. You say, "The master can now continue to process transaction", but you do not say anything about when such processing is stopped. What advatange does pausing the transaction processing give? 4. What will happen if the failover command is executed while the master is alive and doing replication? 5. I have already commented on the command syntax in DERBY-2954 .
          Hide
          Jørgen Løland added a comment -

          >2. I guess you will add a connection attribute that is used to get the
          > a database booted in "slave mode". I think this should be
          > specified in the functional spec. Even if people will normally not
          > use that directly, it will be part of the public API, and I think
          > it should be specified in a funcspec.

          Something like this?

          ij> connect 'jdbc:derby:db;replicationslave;...';

          If so, I do not think we should add this connection attribute. The reason is that the replication idea is to prevent the slave from completing the database booting; it is supposed to stay in recovery and redo log records as they arrive until stop or failover is requested. Since the slave gets stuck in recovery, the statement above would never return a connection - it would just hang. As far as I know, this also means that the connection can not be used to stop replication or perform failover either.

          Show
          Jørgen Løland added a comment - >2. I guess you will add a connection attribute that is used to get the > a database booted in "slave mode". I think this should be > specified in the functional spec. Even if people will normally not > use that directly, it will be part of the public API, and I think > it should be specified in a funcspec. Something like this? ij> connect 'jdbc:derby:db;replicationslave;...'; If so, I do not think we should add this connection attribute. The reason is that the replication idea is to prevent the slave from completing the database booting; it is supposed to stay in recovery and redo log records as they arrive until stop or failover is requested. Since the slave gets stuck in recovery, the statement above would never return a connection - it would just hang. As far as I know, this also means that the connection can not be used to stop replication or perform failover either.
          Hide
          Jørgen Løland added a comment -

          >1. You say, "the database is only booted, not created, on the slave".
          > I think I understand how things will be done, but the booting part
          > is a bit confusing.
          >
          >3. You say, "The master can now continue to process transaction", but
          > you do not say anything about when such processing is stopped.
          > What advatange does pausing the transaction processing give?
          >

          The functional spec will be modified to better describe these.

          Regarding 3) - Processing on the master must be paused while the
          following happens:

          • buffered log records and buffered data pages are forced to
            disk. Actually, forcing the data pages is not strictly
            necessary, but we might as well ship all write operations that
            have been performed to the slave.
          • the entire database directory is sent to the slave (or copied
            to a backup location, from where it can be sent to the slave)
          • the replication log buffer has been started and the logFactory
            has been informed to append log records to the buffer as well
            as to disk.

          The reason for this is that the slave requires a copy of the
          database that is exactly equal to that on the master when log
          shipment starts. When we start sending log records to the slave,
          we need to know that the slave has a database that includes all
          log records up to a LogInstant 'i'. The first log record that is
          sent to the slave must be the one immediately following 'i'.
          Hence the pause.

          >4. What will happen if the failover command is executed while the
          > master is alive and doing replication?

          I can think of at least three alternatives: 1) stop the master,
          and make the old slave a normal Derby instance for the database.
          2) not allow failover to be executed on a slave when the master
          is alive. 3) perform a "switch", i.e., make the old slave the new
          master, and the old master the new slave.

          For now, I think 1) is the best alternative to keep the amount of
          work down while alternative 3) would make a good extension
          candidate to the functionality.

          Show
          Jørgen Løland added a comment - >1. You say, "the database is only booted, not created, on the slave". > I think I understand how things will be done, but the booting part > is a bit confusing. > >3. You say, "The master can now continue to process transaction", but > you do not say anything about when such processing is stopped. > What advatange does pausing the transaction processing give? > The functional spec will be modified to better describe these. Regarding 3) - Processing on the master must be paused while the following happens: buffered log records and buffered data pages are forced to disk. Actually, forcing the data pages is not strictly necessary, but we might as well ship all write operations that have been performed to the slave. the entire database directory is sent to the slave (or copied to a backup location, from where it can be sent to the slave) the replication log buffer has been started and the logFactory has been informed to append log records to the buffer as well as to disk. The reason for this is that the slave requires a copy of the database that is exactly equal to that on the master when log shipment starts. When we start sending log records to the slave, we need to know that the slave has a database that includes all log records up to a LogInstant 'i'. The first log record that is sent to the slave must be the one immediately following 'i'. Hence the pause. >4. What will happen if the failover command is executed while the > master is alive and doing replication? I can think of at least three alternatives: 1) stop the master, and make the old slave a normal Derby instance for the database. 2) not allow failover to be executed on a slave when the master is alive. 3) perform a "switch", i.e., make the old slave the new master, and the old master the new slave. For now, I think 1) is the best alternative to keep the amount of work down while alternative 3) would make a good extension candidate to the functionality.
          Hide
          Øystein Grøvlen added a comment -

          Jørgen Løland (JIRA) wrote:
          > Regarding 3) - Processing on the master must be paused while the
          > following happens:
          >
          > * buffered log records and buffered data pages are forced to
          > disk. Actually, forcing the data pages is not strictly
          > necessary, but we might as well ship all write operations that
          > have been performed to the slave.
          > * the entire database directory is sent to the slave (or copied
          > to a backup location, from where it can be sent to the slave)
          > * the replication log buffer has been started and the logFactory
          > has been informed to append log records to the buffer as well
          > as to disk.

          I would think the online backup mechanism has already solved some these issues. Have you consider using an online backup to get a copy of the database and existing log?

          >
          > The reason for this is that the slave requires a copy of the
          > database that is exactly equal to that on the master when log
          > shipment starts. When we start sending log records to the slave,
          > we need to know that the slave has a database that includes all
          > log records up to a LogInstant 'i'. The first log record that is
          > sent to the slave must be the one immediately following 'i'.
          > Hence the pause.

          I do not understand why it needs to be exactly the same database. Recovery already handles redo of log records that are already reflected in the database. What harm would it make if you sent log records with LogInstant less than 'i'?

          >
          >> 4. What will happen if the failover command is executed while the
          >> master is alive and doing replication?
          >
          > I can think of at least three alternatives: 1) stop the master,
          > and make the old slave a normal Derby instance for the database.
          > 2) not allow failover to be executed on a slave when the master
          > is alive. 3) perform a "switch", i.e., make the old slave the new
          > master, and the old master the new slave.
          >
          > For now, I think 1) is the best alternative to keep the amount of
          > work down while alternative 3) would make a good extension
          > candidate to the functionality.

          I assume the failover command is sent to the slave. Both 1) and 3) will then require some mechanism where the slave sends commands to the master. If you want keep the work down, another alternative could be
          4) take down the connection to the master and perform failover. But maybe that creates a too high risk for inconsistencies since you may end up with two masters that both will serve clients.

          I think an important use-case is to be able to switch to another master during planned maintenance. That could be done without loosing any transactions if the sequence of operations are as follows:
          1. Stop new transaction on master
          2. Flush all existing log to slave
          3. Redo all log on slave
          4. Do failover

          There is already a mechanism for 1. I guess 3. is part of 4. So it is really about being able make sure existing log is flushed to disk sending the failover command. Do we need a separate command for that, or could it be part of the stop replication command. That is, when stop replication is received, existing log is sent before replication is stopped. Then the three steps of a "planned failover" would be:
          1. freeze database on master
          2. stop replication on master
          3. failover on slave

          Show
          Øystein Grøvlen added a comment - Jørgen Løland (JIRA) wrote: > Regarding 3) - Processing on the master must be paused while the > following happens: > > * buffered log records and buffered data pages are forced to > disk. Actually, forcing the data pages is not strictly > necessary, but we might as well ship all write operations that > have been performed to the slave. > * the entire database directory is sent to the slave (or copied > to a backup location, from where it can be sent to the slave) > * the replication log buffer has been started and the logFactory > has been informed to append log records to the buffer as well > as to disk. I would think the online backup mechanism has already solved some these issues. Have you consider using an online backup to get a copy of the database and existing log? > > The reason for this is that the slave requires a copy of the > database that is exactly equal to that on the master when log > shipment starts. When we start sending log records to the slave, > we need to know that the slave has a database that includes all > log records up to a LogInstant 'i'. The first log record that is > sent to the slave must be the one immediately following 'i'. > Hence the pause. I do not understand why it needs to be exactly the same database. Recovery already handles redo of log records that are already reflected in the database. What harm would it make if you sent log records with LogInstant less than 'i'? > >> 4. What will happen if the failover command is executed while the >> master is alive and doing replication? > > I can think of at least three alternatives: 1) stop the master, > and make the old slave a normal Derby instance for the database. > 2) not allow failover to be executed on a slave when the master > is alive. 3) perform a "switch", i.e., make the old slave the new > master, and the old master the new slave. > > For now, I think 1) is the best alternative to keep the amount of > work down while alternative 3) would make a good extension > candidate to the functionality. I assume the failover command is sent to the slave. Both 1) and 3) will then require some mechanism where the slave sends commands to the master. If you want keep the work down, another alternative could be 4) take down the connection to the master and perform failover. But maybe that creates a too high risk for inconsistencies since you may end up with two masters that both will serve clients. I think an important use-case is to be able to switch to another master during planned maintenance. That could be done without loosing any transactions if the sequence of operations are as follows: 1. Stop new transaction on master 2. Flush all existing log to slave 3. Redo all log on slave 4. Do failover There is already a mechanism for 1. I guess 3. is part of 4. So it is really about being able make sure existing log is flushed to disk sending the failover command. Do we need a separate command for that, or could it be part of the stop replication command. That is, when stop replication is received, existing log is sent before replication is stopped. Then the three steps of a "planned failover" would be: 1. freeze database on master 2. stop replication on master 3. failover on slave
          Hide
          Jørgen Løland added a comment -

          >> Regarding 3) - Processing on the master must be paused while the
          >> following happens:
          >> * buffered log records and buffered data pages are forced to
          >> disk. Actually, forcing the data pages is not strictly
          >> necessary, but we might as well ship all write operations that
          >> have been performed to the slave.
          >> * the entire database directory is sent to the slave (or copied
          >> to a backup location, from where it can be sent to the slave)
          >> * the replication log buffer has been started and the logFactory
          >> has been informed to append log records to the buffer as well
          >> as to disk.
          > I would think the online backup mechanism has already solved some these issues. Have you consider using an online backup to get a copy of the database and existing log?

          I had a brief look at backup a few weeks ago. If I remember
          correctly, it does more or less the same as described above: It
          pauses the database while flushing data and log to disk, and
          copies the entire database directory to a backup location.
          Processing is not resumed until this copying has completed.
          Functionality to pause (freeze) the database
          is likely to be reused, and the strategy follows in the same steps as
          backup.

          From the adminguide, topic "Online backups":
          "The SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure puts the database into a state in which it can be safely copied, then copies the entire original database directory (including data files, online transaction log files, and jar files) to the specified backup directory. Files that are not within the original database directory (for example, derby.properties) are not copied."

          If this is the mechanism you are referring to, we can use backup
          to do the "or copied to a backup location, from where it can be
          sent to the slave" part. That strategy will freeze the database
          for a shorter time iff the disk is faster than the network. On
          the other hand, it will require 2x diskspace and potentially much
          memory because log records will accumulate while the backed-up
          database is being sent to the slave.

          If there is another, nonblocking backup mechanism that I don't know
          of, please refer me to it. If so, we may have to rework our plans.

          >> The reason for this is that the slave requires a copy of the
          >> database that is exactly equal to that on the master when log
          >> shipment starts. When we start sending log records to the slave,
          >> we need to know that the slave has a database that includes all
          >> log records up to a LogInstant 'i'. The first log record that is
          >> sent to the slave must be the one immediately following 'i'.
          >> Hence the pause.

          > I do not understand why it needs to be exactly the same database. Recovery already handles redo of log records that are already reflected in the database. What harm would it make if you sent log records with LogInstant less than 'i'?

          The problem is caused by us writing the log record to the slave
          log file before recovering it.

          Unfortunately (in this case), the LSN in
          Derby (LogInstant) is the byte position where the log record
          starts in the log file. Since undo operations seem to identify
          their respective do operations using the LogInstant (seems to me
          to be "hidden" inside an undo log record's byte[] data), all log
          records must be found exactly the same place in the master and
          slave log files. Hence, duplicates of log records cannot exist on
          file without invalidating the LSNs.

          We could, of course, start sending log records < i, and let the
          slave ignore these. Even if we decide to send a backup of the
          database, it would still be simple to start log shipping at exactly
          'i', however. I see no reason for not using exactly 'i'...

          >>> 4. What will happen if the failover command is executed while the
          >>> master is alive and doing replication?
          >>
          >> I can think of at least three alternatives: 1) stop the master,
          >> and make the old slave a normal Derby instance for the database.
          >> 2) not allow failover to be executed on a slave when the master
          >> is alive. 3) perform a "switch", i.e., make the old slave the new
          >> master, and the old master the new slave.
          >>
          >> For now, I think 1) is the best alternative to keep the amount of
          >> work down while alternative 3) would make a good extension
          >> candidate to the functionality.

          >I assume the failover command is sent to the slave. Both 1) and 3) will then require some mechanism where the slave sends commands to the master. If you want keep the work down, another alternative could be
          >4) take down the connection to the master and perform failover. But maybe that creates a too high risk for inconsistencies since you may end up with two masters that both will serve clients.

          >I think an important use-case is to be able to switch to another master during planned maintenance.

          <snip>

          Good point. This scenario should be added to the funcspec.

          Show
          Jørgen Løland added a comment - >> Regarding 3) - Processing on the master must be paused while the >> following happens: >> * buffered log records and buffered data pages are forced to >> disk. Actually, forcing the data pages is not strictly >> necessary, but we might as well ship all write operations that >> have been performed to the slave. >> * the entire database directory is sent to the slave (or copied >> to a backup location, from where it can be sent to the slave) >> * the replication log buffer has been started and the logFactory >> has been informed to append log records to the buffer as well >> as to disk. > I would think the online backup mechanism has already solved some these issues. Have you consider using an online backup to get a copy of the database and existing log? I had a brief look at backup a few weeks ago. If I remember correctly, it does more or less the same as described above: It pauses the database while flushing data and log to disk, and copies the entire database directory to a backup location. Processing is not resumed until this copying has completed. Functionality to pause (freeze) the database is likely to be reused, and the strategy follows in the same steps as backup. From the adminguide, topic "Online backups": "The SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure puts the database into a state in which it can be safely copied, then copies the entire original database directory (including data files, online transaction log files, and jar files) to the specified backup directory. Files that are not within the original database directory (for example, derby.properties) are not copied." If this is the mechanism you are referring to, we can use backup to do the "or copied to a backup location, from where it can be sent to the slave" part. That strategy will freeze the database for a shorter time iff the disk is faster than the network. On the other hand, it will require 2x diskspace and potentially much memory because log records will accumulate while the backed-up database is being sent to the slave. If there is another, nonblocking backup mechanism that I don't know of, please refer me to it. If so, we may have to rework our plans. >> The reason for this is that the slave requires a copy of the >> database that is exactly equal to that on the master when log >> shipment starts. When we start sending log records to the slave, >> we need to know that the slave has a database that includes all >> log records up to a LogInstant 'i'. The first log record that is >> sent to the slave must be the one immediately following 'i'. >> Hence the pause. > I do not understand why it needs to be exactly the same database. Recovery already handles redo of log records that are already reflected in the database. What harm would it make if you sent log records with LogInstant less than 'i'? The problem is caused by us writing the log record to the slave log file before recovering it. Unfortunately (in this case), the LSN in Derby (LogInstant) is the byte position where the log record starts in the log file. Since undo operations seem to identify their respective do operations using the LogInstant (seems to me to be "hidden" inside an undo log record's byte[] data), all log records must be found exactly the same place in the master and slave log files. Hence, duplicates of log records cannot exist on file without invalidating the LSNs. We could, of course, start sending log records < i, and let the slave ignore these. Even if we decide to send a backup of the database, it would still be simple to start log shipping at exactly 'i', however. I see no reason for not using exactly 'i'... >>> 4. What will happen if the failover command is executed while the >>> master is alive and doing replication? >> >> I can think of at least three alternatives: 1) stop the master, >> and make the old slave a normal Derby instance for the database. >> 2) not allow failover to be executed on a slave when the master >> is alive. 3) perform a "switch", i.e., make the old slave the new >> master, and the old master the new slave. >> >> For now, I think 1) is the best alternative to keep the amount of >> work down while alternative 3) would make a good extension >> candidate to the functionality. >I assume the failover command is sent to the slave. Both 1) and 3) will then require some mechanism where the slave sends commands to the master. If you want keep the work down, another alternative could be >4) take down the connection to the master and perform failover. But maybe that creates a too high risk for inconsistencies since you may end up with two masters that both will serve clients. >I think an important use-case is to be able to switch to another master during planned maintenance. <snip> Good point. This scenario should be added to the funcspec.
          Hide
          Øystein Grøvlen added a comment -

          > Something like this?
          >
          > ij> connect 'jdbc:derby:db;replicationslave;...';
          >
          > If so, I do not think we should add this connection attribute. The
          > reason is that the replication idea is to prevent the slave from
          > completing the database booting; it is supposed to stay in recovery
          > and redo log records as they arrive until stop or failover is
          > requested. Since the slave gets stuck in recovery, the statement
          > above would never return a connection - it would just hang. As far
          > as I know, this also means that the connection can not be used to
          > stop replication or perform failover either.

          The normal way to initiate recovery is to boot a database by opening a
          connection to it. Will you use another mechanism to boot a slave
          database?

          Doing a connect that boots the database and blocks until failover is
          done, could be a useful mechanism for users who want to embed a slave
          in their own solution (instead of using the Derby network server).
          Then the return of this call, will indicate that connections may
          opened towards the database (it has become master).

          I am not sure that it is a problem that you will need to use other
          connections to stop replication or perform failover. That will be
          similar to how the network server is managed. The command that start
          the server blocks, and you will need to use other connections to
          manage (e.g., stop) the server.

          Show
          Øystein Grøvlen added a comment - > Something like this? > > ij> connect 'jdbc:derby:db;replicationslave;...'; > > If so, I do not think we should add this connection attribute. The > reason is that the replication idea is to prevent the slave from > completing the database booting; it is supposed to stay in recovery > and redo log records as they arrive until stop or failover is > requested. Since the slave gets stuck in recovery, the statement > above would never return a connection - it would just hang. As far > as I know, this also means that the connection can not be used to > stop replication or perform failover either. The normal way to initiate recovery is to boot a database by opening a connection to it. Will you use another mechanism to boot a slave database? Doing a connect that boots the database and blocks until failover is done, could be a useful mechanism for users who want to embed a slave in their own solution (instead of using the Derby network server). Then the return of this call, will indicate that connections may opened towards the database (it has become master). I am not sure that it is a problem that you will need to use other connections to stop replication or perform failover. That will be similar to how the network server is managed. The command that start the server blocks, and you will need to use other connections to manage (e.g., stop) the server.
          Hide
          Øystein Grøvlen added a comment -

          Jørgen Løland (JIRA) wrote:
          > From the adminguide, topic "Online backups":
          > "The SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure puts the database
          > into a state in which it can be safely copied, then copies the
          > entire original database directory (including data files, online
          > transaction log files, and jar files) to the specified backup
          > directory. Files that are not within the original database directory
          > (for example, derby.properties) are not copied."

          This citation is from the 10.1 version of the guide. If you look at
          the 10.2 guide, you will see that it has been changed and that the
          backup is now non-blocking.

          >
          > If this is the mechanism you are referring to, we can use backup
          > to do the "or copied to a backup location, from where it can be
          > sent to the slave" part. That strategy will freeze the database
          > for a shorter time iff the disk is faster than the network. On
          > the other hand, it will require 2x diskspace and potentially much
          > memory because log records will accumulate while the backed-up
          > database is being sent to the slave.

          If you use the non-blocking backup mechanism, you will have to copy
          both database and log files and you will probably want to delay
          switching to replication of individual log records until the backup
          is completed.

          Show
          Øystein Grøvlen added a comment - Jørgen Løland (JIRA) wrote: > From the adminguide, topic "Online backups": > "The SYSCS_UTIL.SYSCS_BACKUP_DATABASE() procedure puts the database > into a state in which it can be safely copied, then copies the > entire original database directory (including data files, online > transaction log files, and jar files) to the specified backup > directory. Files that are not within the original database directory > (for example, derby.properties) are not copied." This citation is from the 10.1 version of the guide. If you look at the 10.2 guide, you will see that it has been changed and that the backup is now non-blocking. > > If this is the mechanism you are referring to, we can use backup > to do the "or copied to a backup location, from where it can be > sent to the slave" part. That strategy will freeze the database > for a shorter time iff the disk is faster than the network. On > the other hand, it will require 2x diskspace and potentially much > memory because log records will accumulate while the backed-up > database is being sent to the slave. If you use the non-blocking backup mechanism, you will have to copy both database and log files and you will probably want to delay switching to replication of individual log records until the backup is completed.
          Hide
          Øystein Grøvlen added a comment -

          Jørgen Løland (JIRA) wrote:
          >> I do not understand why it needs to be exactly the same database.
          >> Recovery already handles redo of log records that are already
          >> reflected in the database. What harm would it make if you sent log
          >> records with LogInstant less than 'i'?
          >
          > The problem is caused by us writing the log record to the slave
          > log file before recovering it.
          >
          > Unfortunately (in this case), the LSN in
          > Derby (LogInstant) is the byte position where the log record
          > starts in the log file. Since undo operations seem to identify
          > their respective do operations using the LogInstant (seems to me
          > to be "hidden" inside an undo log record's byte[] data), all log
          > records must be found exactly the same place in the master and
          > slave log files. Hence, duplicates of log records cannot exist on
          > file without invalidating the LSNs.

          I do not understand what avoiding duplicate log records has to do with
          requiring a specific state of the database.

          With respect to having log records appear in the same place in the log
          files, you could consider forcing a log file switch on the master
          before sending the first log record. Then, the first log record will
          be at the start of a log file both on the master and the slave. Could
          that simplify things?

          > We could, of course, start sending log records < i, and let the
          > slave ignore these. Even if we decide to send a backup of the
          > database, it would still be simple to start log shipping at exactly
          > 'i', however. I see no reason for not using exactly 'i'...

          I think you said that one reason for the starting of replication to be
          blocking was that you needed to identify a specific log record 'i'.
          In other words, if you did not need to identify 'i', there would be
          one less reason to make the start of the replication be blocking.

          Show
          Øystein Grøvlen added a comment - Jørgen Løland (JIRA) wrote: >> I do not understand why it needs to be exactly the same database. >> Recovery already handles redo of log records that are already >> reflected in the database. What harm would it make if you sent log >> records with LogInstant less than 'i'? > > The problem is caused by us writing the log record to the slave > log file before recovering it. > > Unfortunately (in this case), the LSN in > Derby (LogInstant) is the byte position where the log record > starts in the log file. Since undo operations seem to identify > their respective do operations using the LogInstant (seems to me > to be "hidden" inside an undo log record's byte[] data), all log > records must be found exactly the same place in the master and > slave log files. Hence, duplicates of log records cannot exist on > file without invalidating the LSNs. I do not understand what avoiding duplicate log records has to do with requiring a specific state of the database. With respect to having log records appear in the same place in the log files, you could consider forcing a log file switch on the master before sending the first log record. Then, the first log record will be at the start of a log file both on the master and the slave. Could that simplify things? > We could, of course, start sending log records < i, and let the > slave ignore these. Even if we decide to send a backup of the > database, it would still be simple to start log shipping at exactly > 'i', however. I see no reason for not using exactly 'i'... I think you said that one reason for the starting of replication to be blocking was that you needed to identify a specific log record 'i'. In other words, if you did not need to identify 'i', there would be one less reason to make the start of the replication be blocking.
          Hide
          Øystein Grøvlen added a comment -

          Another issue to think about:

          If an transaction has been open since before replication was started, a
          slave may need log from before log shipping started to be able to undo
          the transaction at failover time. In order to be sure that failover
          will succeed, I think there are two alternatives:
          1) Let the master abort/rollback such transactions before it
          acknowledges that replication has started.
          2) Send old log records to the slave to make sure it has all
          necessary log to do failover.

          I think the approach of non-blocking backup is similar to alternative
          2). That is, it copies all log files that may be necessary in order to
          perform undo at restore.

          Show
          Øystein Grøvlen added a comment - Another issue to think about: If an transaction has been open since before replication was started, a slave may need log from before log shipping started to be able to undo the transaction at failover time. In order to be sure that failover will succeed, I think there are two alternatives: 1) Let the master abort/rollback such transactions before it acknowledges that replication has started. 2) Send old log records to the slave to make sure it has all necessary log to do failover. I think the approach of non-blocking backup is similar to alternative 2). That is, it copies all log files that may be necessary in order to perform undo at restore.
          Hide
          V.Narayanan added a comment -

          >Doing a connect that boots the database and blocks until failover is
          >done, could be a useful mechanism for users who want to embed a slave
          >in their own solution (instead of using the Derby network server).
          >Then the return of this call, will indicate that connections may
          >opened towards the database (it has become master).

          >I am not sure that it is a problem that you will need to use other
          >connections to stop replication or perform failover. That will be
          >similar to how the network server is managed. The command that start
          >the server blocks, and you will need to use other connections to
          >manage (e.g., stop) the server.

          The use case pointed out here seems very valid and qualifies as good
          enough for this startup mechanism. At first look this seems very doable.

          I was trying to imagine what would be the changes we would need to do
          to the replication mechanism to accomodate this. I guess there wouldn't
          be anything other than being able to start the basic replication functionality
          upon doing a connect with the replicatino attribute set.

          >If you use the non-blocking backup mechanism, you will have to copy
          >both database and log files and you will probably want to delay
          >switching to replication of individual log records until the backup
          >is completed.

          Switching between two different messages sent across the network seems
          like additional complication which we can avoid if we do not use online
          backup. But then there is the additional complexity of handling unlogged
          operation which online backup has already solved.

          I am undecided about this, I even tried to imagine how much switching
          between using single log records and using a log buffer would complicate
          the Network code, but the overhead is not much.

          Would the requirement of 2x space qualify not using online backup?

          >With respect to having log records appear in the same place in the log
          >files, you could consider forcing a log file switch on the master
          >before sending the first log record. Then, the first log record will
          >be at the start of a log file both on the master and the slave. Could
          >that simplify things?

          I the comments above a case has been pointed out

          "If an transaction has been open since before replication was started, a
          slave may need log from before log shipping started to be able to undo
          the transaction at failover time."

          Would it be easy in this case too? How would the switching happen in this
          case?

          I will have to dig deeper here, I do not have an answer for this now.

          >If an transaction has been open since before replication was started, a
          >slave may need log from before log shipping started to be able to undo
          >the transaction at failover time. In order to be sure that failover
          >will succeed, I think there are two alternatives:
          >2) Send old log records to the slave to make sure it has all
          > necessary log to do failover.
          I agree with 2) and think it is the right approach.

          Show
          V.Narayanan added a comment - >Doing a connect that boots the database and blocks until failover is >done, could be a useful mechanism for users who want to embed a slave >in their own solution (instead of using the Derby network server). >Then the return of this call, will indicate that connections may >opened towards the database (it has become master). >I am not sure that it is a problem that you will need to use other >connections to stop replication or perform failover. That will be >similar to how the network server is managed. The command that start >the server blocks, and you will need to use other connections to >manage (e.g., stop) the server. The use case pointed out here seems very valid and qualifies as good enough for this startup mechanism. At first look this seems very doable. I was trying to imagine what would be the changes we would need to do to the replication mechanism to accomodate this. I guess there wouldn't be anything other than being able to start the basic replication functionality upon doing a connect with the replicatino attribute set. >If you use the non-blocking backup mechanism, you will have to copy >both database and log files and you will probably want to delay >switching to replication of individual log records until the backup >is completed. Switching between two different messages sent across the network seems like additional complication which we can avoid if we do not use online backup. But then there is the additional complexity of handling unlogged operation which online backup has already solved. I am undecided about this, I even tried to imagine how much switching between using single log records and using a log buffer would complicate the Network code, but the overhead is not much. Would the requirement of 2x space qualify not using online backup? >With respect to having log records appear in the same place in the log >files, you could consider forcing a log file switch on the master >before sending the first log record. Then, the first log record will >be at the start of a log file both on the master and the slave. Could >that simplify things? I the comments above a case has been pointed out "If an transaction has been open since before replication was started, a slave may need log from before log shipping started to be able to undo the transaction at failover time." Would it be easy in this case too? How would the switching happen in this case? I will have to dig deeper here, I do not have an answer for this now. >If an transaction has been open since before replication was started, a >slave may need log from before log shipping started to be able to undo >the transaction at failover time. In order to be sure that failover >will succeed, I think there are two alternatives: >2) Send old log records to the slave to make sure it has all > necessary log to do failover. I agree with 2) and think it is the right approach.
          Hide
          Øystein Grøvlen added a comment -

          V.Narayanan (JIRA) wrote:
          >> If you use the non-blocking backup mechanism, you will have to copy
          >> both database and log files and you will probably want to delay
          >> switching to replication of individual log records until the backup
          >> is completed.
          >
          > Switching between two different messages sent across the network seems
          > like additional complication which we can avoid if we do not use online
          > backup. But then there is the additional complexity of handling unlogged
          > operation which online backup has already solved.

          My thouhgt was that you could use the same mechanism to send both data files and log files from the backup.

          > Would the requirement of 2x space qualify not using online backup?

          I do not think so.

          > I the comments above a case has been pointed out
          >
          > "If an transaction has been open since before replication was started, a
          > slave may need log from before log shipping started to be able to undo
          > the transaction at failover time."
          >
          > Would it be easy in this case too? How would the switching happen in this
          > case?

          The switch will be the point where you switch from sending log files to sending individual log records. The issue about old transaction only affects how old log files you need to send, not when to switch.

          Show
          Øystein Grøvlen added a comment - V.Narayanan (JIRA) wrote: >> If you use the non-blocking backup mechanism, you will have to copy >> both database and log files and you will probably want to delay >> switching to replication of individual log records until the backup >> is completed. > > Switching between two different messages sent across the network seems > like additional complication which we can avoid if we do not use online > backup. But then there is the additional complexity of handling unlogged > operation which online backup has already solved. My thouhgt was that you could use the same mechanism to send both data files and log files from the backup. > Would the requirement of 2x space qualify not using online backup? I do not think so. > I the comments above a case has been pointed out > > "If an transaction has been open since before replication was started, a > slave may need log from before log shipping started to be able to undo > the transaction at failover time." > > Would it be easy in this case too? How would the switching happen in this > case? The switch will be the point where you switch from sending log files to sending individual log records. The issue about old transaction only affects how old log files you need to send, not when to switch.
          Hide
          Øystein Grøvlen added a comment -

          Derby allows a user to specify that the log file directory are to be placed in a separate directory from the database directory. How will you handle that when you create the slave database? It is probably OK to say that in the first version it is not possible store the log separately from the database. Whatever you decide, it would be nice if your decision is reflected in the func spec.

          Show
          Øystein Grøvlen added a comment - Derby allows a user to specify that the log file directory are to be placed in a separate directory from the database directory. How will you handle that when you create the slave database? It is probably OK to say that in the first version it is not possible store the log separately from the database. Whatever you decide, it would be nice if your decision is reflected in the func spec.
          Hide
          Jørgen Løland added a comment -

          >> From the adminguide, topic "Online backups":
          > <snip>
          >
          >This citation is from the 10.1 version of the guide. If you look at
          >the 10.2 guide, you will see that it has been changed and that the
          >backup is now non-blocking.

          Actually, that was from the 10.2 documentation. I also checked the alpha manuals, and could not find anything about non-blocking. However, that does not change the fact that backup IS performed non-blocking (searching the web revealed a few presentations stating so).

          Since I was not previously aware of non-blocking backup, some adjustments will probably be made to our development plan and the funcspec. More precisely, it would be great if non-blocking online backup would provide us the tool to avoid freezing the database. I also agree that it should not be necessary to freeze the database only to start at the correct log instant 'i'.

          >Another issue to think about:
          >
          >If an transaction has been open since before replication was started, a
          >slave may need log from before log shipping started to be able to undo
          >the transaction at failover time. In order to be sure that failover
          >will succeed, I think there are two alternatives:
          > 1) Let the master abort/rollback such transactions before it
          > acknowledges that replication has started.
          > 2) Send old log records to the slave to make sure it has all
          > necessary log to do failover.
          >
          >I think the approach of non-blocking backup is similar to alternative
          >2). That is, it copies all log files that may be necessary in order to
          >perform undo at restore.

          I have not had the time to figure out how non-blocking backup works, but a shot in the dark would be that it works in a similar way as fuzzy checkpointing. As you say, the non-blocking backup mechanism has to ensure a solution to this problem. Using the same solution is probably a good thing.

          With this new information, I think the revised plan will be something like this:

          1) A Derby is informed that it will become master for db 'x'.
          2) The master starts making an online, non-blocking backup. The location of this backup may be disk or even sent over the network to the slave. This has not been decided.
          3) Online backup completes. From this point on, all operations are logged both to disk and to the replication buffer.
          4) If the backup location was disk, the entire backup is sent to the slave. The disk-version can then be deleted.
          5) When the backup has been sent to the slave, the master starts sending log from the log buffer. Once the slave has received enough log to be "up to date" (depends on how tightly synchronized the master and slave is), repliation is reported to have started. Since we are currently working on asynchronous replication, we can report that replication has started when we start shipping buffered log records.

          Note that this does not require any database freeze as long as backup can be performed without freeze.

          Pros and cons versus the previous plan:
          + Non-blocking

          • Requires 2x diskspace during steps 2-4
          • Requires more memory from step 3 to the point where the slave has caught up with the master.
          Show
          Jørgen Løland added a comment - >> From the adminguide, topic "Online backups": > <snip> > >This citation is from the 10.1 version of the guide. If you look at >the 10.2 guide, you will see that it has been changed and that the >backup is now non-blocking. Actually, that was from the 10.2 documentation. I also checked the alpha manuals, and could not find anything about non-blocking. However, that does not change the fact that backup IS performed non-blocking (searching the web revealed a few presentations stating so). Since I was not previously aware of non-blocking backup, some adjustments will probably be made to our development plan and the funcspec. More precisely, it would be great if non-blocking online backup would provide us the tool to avoid freezing the database. I also agree that it should not be necessary to freeze the database only to start at the correct log instant 'i'. >Another issue to think about: > >If an transaction has been open since before replication was started, a >slave may need log from before log shipping started to be able to undo >the transaction at failover time. In order to be sure that failover >will succeed, I think there are two alternatives: > 1) Let the master abort/rollback such transactions before it > acknowledges that replication has started. > 2) Send old log records to the slave to make sure it has all > necessary log to do failover. > >I think the approach of non-blocking backup is similar to alternative >2). That is, it copies all log files that may be necessary in order to >perform undo at restore. I have not had the time to figure out how non-blocking backup works, but a shot in the dark would be that it works in a similar way as fuzzy checkpointing. As you say, the non-blocking backup mechanism has to ensure a solution to this problem. Using the same solution is probably a good thing. With this new information, I think the revised plan will be something like this: 1) A Derby is informed that it will become master for db 'x'. 2) The master starts making an online, non-blocking backup. The location of this backup may be disk or even sent over the network to the slave. This has not been decided. 3) Online backup completes. From this point on, all operations are logged both to disk and to the replication buffer. 4) If the backup location was disk, the entire backup is sent to the slave. The disk-version can then be deleted. 5) When the backup has been sent to the slave, the master starts sending log from the log buffer. Once the slave has received enough log to be "up to date" (depends on how tightly synchronized the master and slave is), repliation is reported to have started. Since we are currently working on asynchronous replication, we can report that replication has started when we start shipping buffered log records. Note that this does not require any database freeze as long as backup can be performed without freeze. Pros and cons versus the previous plan: + Non-blocking Requires 2x diskspace during steps 2-4 Requires more memory from step 3 to the point where the slave has caught up with the master.
          Hide
          V.Narayanan added a comment -

          I am attaching a new version of the functional spec that contains
          the following changes

          • Modified the new commands added to NetworkServerControl to match the
            existing pattern
          • Added pre-conditions to each of the NetworkServerControl commands.

          I have incorporated here feedback I received from Dag and Oystein.

          I have further received feedback on the handling of Upgrade which
          I shall update in the functional specification and submit in a subsequent
          revision.

          Show
          V.Narayanan added a comment - I am attaching a new version of the functional spec that contains the following changes Modified the new commands added to NetworkServerControl to match the existing pattern Added pre-conditions to each of the NetworkServerControl commands. I have incorporated here feedback I received from Dag and Oystein. I have further received feedback on the handling of Upgrade which I shall update in the functional specification and submit in a subsequent revision.
          Hide
          Ole Solberg added a comment -

          I have a few comments to the

          Functional Specification for Derby Replication - rev. 4.0 - table in "Interacting with the replication feature":

          1) As I understand it 'Start Master' is only allowed on the host which will serve as master,
          and 'Start Slave' is only allowed on the host to serve as slave,
          thus the I think the "operation field" in the table should state this (as for 'Failover' and 'Stop Replication').

          2) I think the "pre-conditions" field should say something like

          • 'Start Master':
            • A database with the name <dbname> must exist on the (master) host where
              this command is issued.
              (was: 'Restricted to the same machine that the database resides.')
          • 'Start Slave':
            • The (slave) host, where this command is issued,
              must not already be serving a database named <dbname> as slave.
              (was: 'Restricted to the same machine that the database resides.')
          • 'Failover':
            • The (slave) host, where this command is issued,
              must be serving a database named <dbname> as slave.
              (was: 'Restricted to the same machine that the database resides.')
          • 'Stop Replication':
            • The (master) host, where this command is issued,
              must be serving a database named <dbname> as master.
              (was: 'Can be issued only on the master and the master in turn ...
              ... the slave is shut down as well')

          The current text in 'Stop Replication'/"pre-conditions" is, I think, rather part of
          the functional specification of the command and should be put elsewhere? e.g.
          in a subchapter giving more details on 'Stop Replication'?

          Show
          Ole Solberg added a comment - I have a few comments to the Functional Specification for Derby Replication - rev. 4.0 - table in "Interacting with the replication feature": 1) As I understand it 'Start Master' is only allowed on the host which will serve as master, and 'Start Slave' is only allowed on the host to serve as slave, thus the I think the "operation field" in the table should state this (as for 'Failover' and 'Stop Replication'). 2) I think the "pre-conditions" field should say something like 'Start Master': A database with the name <dbname> must exist on the (master) host where this command is issued. (was: 'Restricted to the same machine that the database resides.') 'Start Slave': The (slave) host, where this command is issued, must not already be serving a database named <dbname> as slave. (was: 'Restricted to the same machine that the database resides.') 'Failover': The (slave) host, where this command is issued, must be serving a database named <dbname> as slave. (was: 'Restricted to the same machine that the database resides.') 'Stop Replication': The (master) host, where this command is issued, must be serving a database named <dbname> as master. (was: 'Can be issued only on the master and the master in turn ... ... the slave is shut down as well') The current text in 'Stop Replication'/"pre-conditions" is, I think, rather part of the functional specification of the command and should be put elsewhere? e.g. in a subchapter giving more details on 'Stop Replication'?
          Hide
          Jørgen Løland added a comment -

          Ole,

          thank you for reviewing the func spec.

          I'm attaching a new version, v5, of the functional specification that incorporates the feedback.

          Show
          Jørgen Løland added a comment - Ole, thank you for reviewing the func spec. I'm attaching a new version, v5, of the functional specification that incorporates the feedback.
          Hide
          Jørgen Løland added a comment -

          Attaching a class diagram, master_classes_1.pdf, illustrating the interfaces and classes used in replication master mode. The diagram is not 100% accurate (details left out), so it should only be used to get an overview of the replication design.

          Show
          Jørgen Løland added a comment - Attaching a class diagram, master_classes_1.pdf, illustrating the interfaces and classes used in replication master mode. The diagram is not 100% accurate (details left out), so it should only be used to get an overview of the replication design.
          Hide
          Jørgen Løland added a comment -

          Attached slave_classes_1.pdf, a class diagram for the slave instance with similar limitations (lack of detail) as the master diagram.

          Show
          Jørgen Løland added a comment - Attached slave_classes_1.pdf, a class diagram for the slave instance with similar limitations (lack of detail) as the master diagram.
          Hide
          Jørgen Løland added a comment -

          Attaching new proof of concept patches for the replication functionality. These use the replication patches committed so far (current revision: 578099)

          Show
          Jørgen Løland added a comment - Attaching new proof of concept patches for the replication functionality. These use the replication patches committed so far (current revision: 578099)
          Hide
          Jørgen Løland added a comment -
              • Want to take the new Derby Asynchronous Replication Feature for a spin? ***

          The newly attached proof of concept code, poc_

          {master|slave}

          _v2b.diff can now be tested by anyone interested. Read the attached howto for instructions. Note: we are far from done with this, but the current poc seems to work. Good luck

          Show
          Jørgen Løland added a comment - Want to take the new Derby Asynchronous Replication Feature for a spin? *** The newly attached proof of concept code, poc_ {master|slave} _v2b.diff can now be tested by anyone interested. Read the attached howto for instructions. Note: we are far from done with this, but the current poc seems to work. Good luck
          Hide
          Ole Solberg added a comment - - edited

          I have experimented with the 'proof-of-concept_v2b' code running several derby tests (ij and junit) as "load" against the master server.
          When running the ....tests.derbynet.PrepareStatementTest I get a '.Something wrong with the instants!' message and the slave server dies:

          .
          .
          runUserCommand runUserCommandRemotely runTest testBasicPrepare used 340 ms .
          startServer@atum02:4527 pDoR ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          ..........................................................................................................................................................................................................\
          .................................................................Something wrong with the instants!
          startServer@atum02:4527 pDoR (RPlR) Log instant: 2, 410117
          startServer@atum02:4527 pDoR (RPlR) ... returned instant: 2, 311861
          startServer@atum02:4527 pDoR (RPlR) ... diff: 98256
          startServer@atum02:4527 pDoR
          startServer@atum02:4527 pDoR - - - - - - next log record - - - - - -
          .
          .

          Show
          Ole Solberg added a comment - - edited I have experimented with the 'proof-of-concept_v2b' code running several derby tests (ij and junit) as "load" against the master server. When running the ....tests.derbynet.PrepareStatementTest I get a '.Something wrong with the instants!' message and the slave server dies: . . runUserCommand runUserCommandRemotely runTest testBasicPrepare used 340 ms . startServer@atum02:4527 pDoR ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ ..........................................................................................................................................................................................................\ .................................................................Something wrong with the instants! startServer@atum02:4527 pDoR (RPlR) Log instant: 2, 410117 startServer@atum02:4527 pDoR (RPlR) ... returned instant: 2, 311861 startServer@atum02:4527 pDoR (RPlR) ... diff: 98256 startServer@atum02:4527 pDoR startServer@atum02:4527 pDoR - - - - - - next log record - - - - - - . .
          Hide
          Jørgen Løland added a comment -

          Thank you for testing this, Ole. I guess you'll find many odd errors if you try to test this...

          "Something wrong with the instants" is one of the error messages I added as part of the prototype code. It indicates that the log instant (log file + offset) is not the same on the master and slave, either because the log file sizes are not the same on the slave and master, or because you just found a bug I wasn't aware of. I'll have a look at this next week.

          Was this test by any chance run with a log file size > 1MB on the master?

          Show
          Jørgen Løland added a comment - Thank you for testing this, Ole. I guess you'll find many odd errors if you try to test this... "Something wrong with the instants" is one of the error messages I added as part of the prototype code. It indicates that the log instant (log file + offset) is not the same on the master and slave, either because the log file sizes are not the same on the slave and master, or because you just found a bug I wasn't aware of. I'll have a look at this next week. Was this test by any chance run with a log file size > 1MB on the master?
          Hide
          V.Narayanan added a comment -

          From the functional spec

          >The NetworkServerCommand commands are executed by the main thread,
          >which means that it is not possible to connect to the server acting
          >as slave before failover has taken place. Will, of course, be fixed
          >later.

          The problem occurs because we are booting the slave as part of the
          same thread and it is held up in recovery. Starting it in a separate
          thread should solve this issue.

          Show
          V.Narayanan added a comment - From the functional spec >The NetworkServerCommand commands are executed by the main thread, >which means that it is not possible to connect to the server acting >as slave before failover has taken place. Will, of course, be fixed >later. The problem occurs because we are booting the slave as part of the same thread and it is held up in recovery. Starting it in a separate thread should solve this issue.
          Hide
          V.Narayanan added a comment -

          Pls find attached a new version of the poc slave

          changes between v2b and v2c

          • poc now handles failover before the first log switch
          • db in slave mode is now booted by a seperate slave thread so that
            the network server does not get blocked.
          Show
          V.Narayanan added a comment - Pls find attached a new version of the poc slave changes between v2b and v2c poc now handles failover before the first log switch db in slave mode is now booted by a seperate slave thread so that the network server does not get blocked.
          Hide
          Jørgen Løland added a comment -

          The func spec for replication does not describe the behavior for replication failure on the master side. Since replication is intended to increase availability, I think the best solution is to allow users to continue to access the database when replication fails on the master side. I'll add this to the func spec in a few days unless I hear objections:

          In the case that the master controller finds that replication has failed, and it is not able to correct the problem (e.g. because the slave has crashed), the master replication role will be stopped. The database will still be accessable for users as if replication was never started.

          Show
          Jørgen Løland added a comment - The func spec for replication does not describe the behavior for replication failure on the master side. Since replication is intended to increase availability, I think the best solution is to allow users to continue to access the database when replication fails on the master side. I'll add this to the func spec in a few days unless I hear objections: In the case that the master controller finds that replication has failed, and it is not able to correct the problem (e.g. because the slave has crashed), the master replication role will be stopped. The database will still be accessable for users as if replication was never started.
          Hide
          Jørgen Løland added a comment -

          Attaching a new functional spec, v6, mainly targeting how failure scenarios will be handled.

          Show
          Jørgen Løland added a comment - Attaching a new functional spec, v6, mainly targeting how failure scenarios will be handled.
          Hide
          Daniel John Debrunner added a comment -

          The startslave command's syntax does not include a -slavehost option, but the comments seem to indicate one is available.

          Do stopslave and startfailover need options to define the slavehost and port, otherwise how do they communicate with the slave?

          How do startmaster and stopmaster connect to the master database?

          It's unclear exactly what the startmaster and stopmaster do, especially wrt to the state of the database. Can a database be booted and active when startmaster is called, or does startmaster boot the database? Similar for stopmaster, does it shutdown the database?

          Is there any reason to put these replications commands on the class/command used to control the network server? They don't fit naturally there, why not a replication specific class/command? From the functional spec I can't see any requirement that the master or slave are running the network server, so I assume I can have replication working with embedded only systems.

          How big is this main-memory log buffer, can it be configured?

          extract - "the response time of transactions may increase for as long as log shipment has trouble keeping up with the amount of generated log records."
          Could you explain this more, I don't see the connection between the log buffer filling up and response times of other transactions. The spec says the replication is asynchronous, so won't user transactions still be only limited by the speed at which the transaction log is written to disk?

          The spec seems to imply that the slave can connect with the master, but the startmaster command doesn't specify its own hostname or portnumber so how is this connection made?

          Why if the master loses its connection to the slave will the replication stop, while if the slave loses its connection to the master it keeps retrying? It seems that any temporary glitch in the network connectivity has a huge chance of rendering the replication useless. I can't see the logic behind this, what's stopping the master from keeping retrying. The log buffer being full shouldn't matter should it, the log records are still on disk, or is it that this scheme never reads the transaction log from disk, only from memory as log records are created?

          From reading between the lines, I think this scheme requires that the master database stay booted while replicating, if so I think that's a key piece of information that should be clearly stated in the functional spec. If not, then I think that the how to shutdown a master database and restart replication(without the initial copy) should be documented.

          Show
          Daniel John Debrunner added a comment - The startslave command's syntax does not include a -slavehost option, but the comments seem to indicate one is available. Do stopslave and startfailover need options to define the slavehost and port, otherwise how do they communicate with the slave? How do startmaster and stopmaster connect to the master database? It's unclear exactly what the startmaster and stopmaster do, especially wrt to the state of the database. Can a database be booted and active when startmaster is called, or does startmaster boot the database? Similar for stopmaster, does it shutdown the database? Is there any reason to put these replications commands on the class/command used to control the network server? They don't fit naturally there, why not a replication specific class/command? From the functional spec I can't see any requirement that the master or slave are running the network server, so I assume I can have replication working with embedded only systems. How big is this main-memory log buffer, can it be configured? extract - "the response time of transactions may increase for as long as log shipment has trouble keeping up with the amount of generated log records." Could you explain this more, I don't see the connection between the log buffer filling up and response times of other transactions. The spec says the replication is asynchronous, so won't user transactions still be only limited by the speed at which the transaction log is written to disk? The spec seems to imply that the slave can connect with the master, but the startmaster command doesn't specify its own hostname or portnumber so how is this connection made? Why if the master loses its connection to the slave will the replication stop, while if the slave loses its connection to the master it keeps retrying? It seems that any temporary glitch in the network connectivity has a huge chance of rendering the replication useless. I can't see the logic behind this, what's stopping the master from keeping retrying. The log buffer being full shouldn't matter should it, the log records are still on disk, or is it that this scheme never reads the transaction log from disk, only from memory as log records are created? From reading between the lines, I think this scheme requires that the master database stay booted while replicating, if so I think that's a key piece of information that should be clearly stated in the functional spec. If not, then I think that the how to shutdown a master database and restart replication(without the initial copy) should be documented.
          Hide
          Jørgen Løland added a comment -

          Dan,

          Thanks for showing interest in replication. I'll answer your questions inline, and will update the func spec with the results of the discussion later.

          > The startslave command's syntax does not include a -slavehost option, but the comments seem to indicate one is available.

          You are right; will fix.

          > How do startmaster and stopmaster connect to the master database?

          In the current prototype implementation, all commands are processed in NetworkServerCommandImpl by calling Monitor.startPersistentService(dbname, ...) and Monitor.findService(dbname,...). The plan is to change this to connection url options later, i.e. 'jdbc:derby://host/db;startMaster=true'; Note that since startslave is blocking, the connection 'jdbc:...;startslave=true' call will hang.

          > Do stopslave and startfailover need options to define the slavehost and port, otherwise how do they communicate with the slave?

          Since "startslave" is blocked during LogToFile.recover, Monitor.startPersistentService does not complete for this command. Calling Monitor.findService on the slave database does therefore not work.

          A way around this is to let the thread that receives log from the master and writes it to log file, check for a flag value every X second. A Hashtable could, e.g., be added to Monitor with setFlag(dbname, flagvalue) and getFlag(dbname) methods. The stopslave/failover commands then call Monitor.setFlag(slaveDBName, "failover"/"stopslave").

          A potential problem with this is to authenticate the caller of the command since the AuthenticationService of the slave database is not reachable. I think the best solution would be to only accept failover/stopslave flags if the connection with the master is down. Otherwise, if the connection is working, stop and failover commands should only be accepted from the master.

          > It's unclear exactly what the startmaster and stopmaster do, especially wrt to the state of the database. Can a database be booted and active when startmaster is called, or does startmaster boot the database? Similar for stopmaster, does it shutdown the database?

          The "startmaster" command can only be run against an existing database 'X'. If 'X' has already been booted by the Derby instance that will have the master role, "startmaster" will connect to it and:

          1) copy the files of 'X' to the slave (other transactions will be blocked during this step in the first version of replication -
          may be improved later by exploiting online backup)
          2) create a replication log buffer and make sure all log records are added to this buffer
          3) start a log shipment thread that sends the log asynchronously.

          If 'X' has not already been booted, "startmaster" will boot it and then do the above.

          The "stopmaster" command will

          1) stop log records from being appended to the replication log buffer
          2) stop the log shipper thread from sending more log to the slave
          3) send a message to the slave that replication for database 'X' has been stopped.
          4) close down all replication related functionality without shutting down 'X'

          > Is there any reason to put these replications commands on the class/command used to control the network server? They don't fit naturally there, why not a replication specific class/command? From the functional spec I can't see any requirement that the master or slave are running the network server, so I assume I can have replication working with embedded only systems.

          Implementing this in the network server means that the blocking startslave command will run in a thread on the same vm as the server.

          > How big is this main-memory log buffer, can it be configured?

          In the initial version we use 10 buffers with size 32KB. 32K was chosen because this is the buffer size used in LogAccessFileBuffer.buffer byte[], which are the units copied to the replication buffer. We need multiple buffers so that it is possible to append log while the log shipper is sleeping and when it is busy shipping an older chunk of log. The number of buffers will probably be modified once we get a chance to actually test the funtionality. It will be configurable.

          > extract - "the response time of transactions may increase for as long as log shipment has trouble keeping up with the amount of generated log records."
          > Could you explain this more, I don't see the connection between the log buffer filling up and response times of other transactions. The spec says the replication is asynchronous, so won't user transactions still be only limited by the speed at which the transaction log is written to disk?

          In the current design, log records that need to be shipped to the slave are appended to the replication log buffer at the same time they are written to disk. If the replication log buffer is full, the transaction requesting this disk write has to wait for a chunk of log to be shipped before the log can be added to it. I realize that it is possible to read the log from disk if the buffer overflows. This is a planned improvement, but is delayed for now due to limited developer resources.

          > The spec seems to imply that the slave can connect with the master, but the startmaster command doesn't specify its own hostname or portnumber so how is this connection made?

          The connection between the pairs will be set up by

          1) the slave sets up a ServerSocket
          2) the master connects to the socket on the specified slave
          location (i.e. host:port)
          3) the socket connection can be used to send messages in both
          directions.

          Thus, the slave does not contact the master - it only sends a message using the existing connection.

          > Why if the master loses its connection to the slave will the replication stop, while if the slave loses its connection to the master it keeps retrying? It seems that any temporary glitch in the network connectivity has a huge chance of rendering the replication useless. I can't see the logic behind this, what's stopping the master from keeping retrying. The log buffer being full shouldn't matter should it, the log records are still on disk, or is it that this scheme never reads the transaction log from disk, only from memory as log records are created?

          See answer to response time above. Reading log from disk in case of replication buffer overflow is definitely an improvement, but we delay that improvement for now. It will be high priority on the improvement todo-list.

          > From reading between the lines, I think this scheme requires that the master database stay booted while replicating, if so I think that's a key piece of information that should be clearly stated in the functional spec. If not, then I think that the how to shutdown a master database and restart replication(without the initial copy) should be documented.

          Again, a correct observation. The master database has to stay booted while replicated.

          Show
          Jørgen Løland added a comment - Dan, Thanks for showing interest in replication. I'll answer your questions inline, and will update the func spec with the results of the discussion later. > The startslave command's syntax does not include a -slavehost option, but the comments seem to indicate one is available. You are right; will fix. > How do startmaster and stopmaster connect to the master database? In the current prototype implementation, all commands are processed in NetworkServerCommandImpl by calling Monitor.startPersistentService(dbname, ...) and Monitor.findService(dbname,...). The plan is to change this to connection url options later, i.e. 'jdbc:derby://host/db;startMaster=true'; Note that since startslave is blocking, the connection 'jdbc:...;startslave=true' call will hang. > Do stopslave and startfailover need options to define the slavehost and port, otherwise how do they communicate with the slave? Since "startslave" is blocked during LogToFile.recover, Monitor.startPersistentService does not complete for this command. Calling Monitor.findService on the slave database does therefore not work. A way around this is to let the thread that receives log from the master and writes it to log file, check for a flag value every X second. A Hashtable could, e.g., be added to Monitor with setFlag(dbname, flagvalue) and getFlag(dbname) methods. The stopslave/failover commands then call Monitor.setFlag(slaveDBName, "failover"/"stopslave"). A potential problem with this is to authenticate the caller of the command since the AuthenticationService of the slave database is not reachable. I think the best solution would be to only accept failover/stopslave flags if the connection with the master is down. Otherwise, if the connection is working, stop and failover commands should only be accepted from the master. > It's unclear exactly what the startmaster and stopmaster do, especially wrt to the state of the database. Can a database be booted and active when startmaster is called, or does startmaster boot the database? Similar for stopmaster, does it shutdown the database? The "startmaster" command can only be run against an existing database 'X'. If 'X' has already been booted by the Derby instance that will have the master role, "startmaster" will connect to it and: 1) copy the files of 'X' to the slave (other transactions will be blocked during this step in the first version of replication - may be improved later by exploiting online backup) 2) create a replication log buffer and make sure all log records are added to this buffer 3) start a log shipment thread that sends the log asynchronously. If 'X' has not already been booted, "startmaster" will boot it and then do the above. The "stopmaster" command will 1) stop log records from being appended to the replication log buffer 2) stop the log shipper thread from sending more log to the slave 3) send a message to the slave that replication for database 'X' has been stopped. 4) close down all replication related functionality without shutting down 'X' > Is there any reason to put these replications commands on the class/command used to control the network server? They don't fit naturally there, why not a replication specific class/command? From the functional spec I can't see any requirement that the master or slave are running the network server, so I assume I can have replication working with embedded only systems. Implementing this in the network server means that the blocking startslave command will run in a thread on the same vm as the server. > How big is this main-memory log buffer, can it be configured? In the initial version we use 10 buffers with size 32KB. 32K was chosen because this is the buffer size used in LogAccessFileBuffer.buffer byte[], which are the units copied to the replication buffer. We need multiple buffers so that it is possible to append log while the log shipper is sleeping and when it is busy shipping an older chunk of log. The number of buffers will probably be modified once we get a chance to actually test the funtionality. It will be configurable. > extract - "the response time of transactions may increase for as long as log shipment has trouble keeping up with the amount of generated log records." > Could you explain this more, I don't see the connection between the log buffer filling up and response times of other transactions. The spec says the replication is asynchronous, so won't user transactions still be only limited by the speed at which the transaction log is written to disk? In the current design, log records that need to be shipped to the slave are appended to the replication log buffer at the same time they are written to disk. If the replication log buffer is full, the transaction requesting this disk write has to wait for a chunk of log to be shipped before the log can be added to it. I realize that it is possible to read the log from disk if the buffer overflows. This is a planned improvement, but is delayed for now due to limited developer resources. > The spec seems to imply that the slave can connect with the master, but the startmaster command doesn't specify its own hostname or portnumber so how is this connection made? The connection between the pairs will be set up by 1) the slave sets up a ServerSocket 2) the master connects to the socket on the specified slave location (i.e. host:port) 3) the socket connection can be used to send messages in both directions. Thus, the slave does not contact the master - it only sends a message using the existing connection. > Why if the master loses its connection to the slave will the replication stop, while if the slave loses its connection to the master it keeps retrying? It seems that any temporary glitch in the network connectivity has a huge chance of rendering the replication useless. I can't see the logic behind this, what's stopping the master from keeping retrying. The log buffer being full shouldn't matter should it, the log records are still on disk, or is it that this scheme never reads the transaction log from disk, only from memory as log records are created? See answer to response time above. Reading log from disk in case of replication buffer overflow is definitely an improvement, but we delay that improvement for now. It will be high priority on the improvement todo-list. > From reading between the lines, I think this scheme requires that the master database stay booted while replicating, if so I think that's a key piece of information that should be clearly stated in the functional spec. If not, then I think that the how to shutdown a master database and restart replication(without the initial copy) should be documented. Again, a correct observation. The master database has to stay booted while replicated.
          Hide
          Daniel John Debrunner added a comment -

          >> Is there any reason to put these replications commands on the class/command used to control the network server? They don't fit naturally there, why not a replication specific class/command? From the functional spec I can't see any requirement that the master or slave are running the network server, so I assume I can have replication working with embedded only systems.

          > Implementing this in the network server means that the blocking startslave command will run in a thread on the same vm as the server.

          That's an implementation detail, wouldn't it be better to have a separate class for replication commands, even if they use the same implementation code?
          Does that mean that this replication does not supported embedded only use, a network server must be booted? Good to state that in the spec, if so.

          > Thus, the slave does not contact the master - it only sends a message using the existing connection.

          The functional spec says "The slave tries to reestablish the connection with the master. " - I guess this really means the slave waits for a master to re-connect to it.

          Show
          Daniel John Debrunner added a comment - >> Is there any reason to put these replications commands on the class/command used to control the network server? They don't fit naturally there, why not a replication specific class/command? From the functional spec I can't see any requirement that the master or slave are running the network server, so I assume I can have replication working with embedded only systems. > Implementing this in the network server means that the blocking startslave command will run in a thread on the same vm as the server. That's an implementation detail, wouldn't it be better to have a separate class for replication commands, even if they use the same implementation code? Does that mean that this replication does not supported embedded only use, a network server must be booted? Good to state that in the spec, if so. > Thus, the slave does not contact the master - it only sends a message using the existing connection. The functional spec says "The slave tries to reestablish the connection with the master. " - I guess this really means the slave waits for a master to re-connect to it.
          Hide
          Jørgen Løland added a comment -

          Replication will be supported in both modes once we get the "connection url" working. At that point, there should be two ways to start replication:

          1) 'jdbc:derby:...;<replicationcommand>'
          2) A command line interface that uses 1) internally. This will probably not be useful for embedded. Could be implemented somewhere else than the network layer.

          Embedded will use 1, client will use 1 or 2. The main reason for supporting alternative 2 is to hide that startslave hangs.

          Show
          Jørgen Løland added a comment - Replication will be supported in both modes once we get the "connection url" working. At that point, there should be two ways to start replication: 1) 'jdbc:derby:...;<replicationcommand>' 2) A command line interface that uses 1) internally. This will probably not be useful for embedded. Could be implemented somewhere else than the network layer. Embedded will use 1, client will use 1 or 2. The main reason for supporting alternative 2 is to hide that startslave hangs.
          Hide
          Jørgen Løland added a comment -

          Attaching v7 of the funcspec, incorporating feedback from Dan.

          Dan, it would be great it you had a quick look at the latest rev to verify that we are on the same page here.

          Show
          Jørgen Løland added a comment - Attaching v7 of the funcspec, incorporating feedback from Dan. Dan, it would be great it you had a quick look at the latest rev to verify that we are on the same page here.
          Hide
          Jørgen Løland added a comment -

          While working on DERBY-3189, I realized that the functional specification is not 100% clear when it comes to security measures for replication.

          I plan to update the funcspec with the following information in a few days unless there are comments:

          Master side:

          • Authentication is turned on: Normal Derby connection policy - the user and password must be valid.
          • Authorization is turned on: The user must be valid and be the database owner of the database that will be replicated.
          • System privileges (DERBY-2109) is turned on: The user must be valid and have the "replication" system privilege.

          Slave side - start slave:
          As for master, but with the two-phase strategy used for encryption of databases. This means first booting the slave database for authentication and authorization, shut the slave database down and reboot it in slave mode.

          Slave side - stop slave and failover:
          Cannot get the authentication service from the slave database since it is not fully booted yet. Can authenticate users on system level only. Authorization cannot be checked. If system privileges is turned on, the user must have the "replication" system privilege.

          Since we are not able to check authorization while in slave mode, stop slave and failover commands will only be accepted from the master while the master-slave connection is working. If the slave-master connection is down, any authenticated/properly system-privileged user can issue the commands on the Derby serving the slave database.

          Show
          Jørgen Løland added a comment - While working on DERBY-3189 , I realized that the functional specification is not 100% clear when it comes to security measures for replication. I plan to update the funcspec with the following information in a few days unless there are comments: Master side: Authentication is turned on: Normal Derby connection policy - the user and password must be valid. Authorization is turned on: The user must be valid and be the database owner of the database that will be replicated. System privileges ( DERBY-2109 ) is turned on: The user must be valid and have the "replication" system privilege. Slave side - start slave: As for master, but with the two-phase strategy used for encryption of databases. This means first booting the slave database for authentication and authorization, shut the slave database down and reboot it in slave mode. Slave side - stop slave and failover: Cannot get the authentication service from the slave database since it is not fully booted yet. Can authenticate users on system level only. Authorization cannot be checked. If system privileges is turned on, the user must have the "replication" system privilege. Since we are not able to check authorization while in slave mode, stop slave and failover commands will only be accepted from the master while the master-slave connection is working. If the slave-master connection is down, any authenticated/properly system-privileged user can issue the commands on the Derby serving the slave database.
          Hide
          Jørgen Løland added a comment -

          Attaching v8 of the funcspec. Includes details for authentication/authorization and a behavior change in that replicated databases now are allowed to be shut down. Shutting down a database in replication mode will notify the other replication peer what is happening.

          Show
          Jørgen Løland added a comment - Attaching v8 of the funcspec. Includes details for authentication/authorization and a behavior change in that replicated databases now are allowed to be shut down. Shutting down a database in replication mode will notify the other replication peer what is happening.
          Hide
          Kim Haase added a comment -

          I have a question about the startMaster attribute. Can it can be used when a database is created (that is, in conjunction with the create=true attribute), or must it be used after the database has been created?

          The spec (v8) is not absolutely clear on this, because it says both

          "a master database must be created before it can be replicated."

          and

          "...replication does not need to be specified when the database is created; it can be initiated at any time after the database is created."

          The first part of the second sentence implies that replication could be started when the database is created – that is, in conjunction with create=true. But maybe it means only that you don't have to start it immediately after creating the database – you can replicate a database that's been around for a long time.

          One of the sections in the reference topics on connection attributes is "Combining with other attributes", so it would be helpful to have this information – and any other information you have on which attribute combinations are allowed and which are not. For example, I'm sure you can combine these with the username and password attributes, and I'm sure you can't specify two of the replication commands at the same time. About others, I'm not certain.)

          Show
          Kim Haase added a comment - I have a question about the startMaster attribute. Can it can be used when a database is created (that is, in conjunction with the create=true attribute), or must it be used after the database has been created? The spec (v8) is not absolutely clear on this, because it says both "a master database must be created before it can be replicated." and "...replication does not need to be specified when the database is created; it can be initiated at any time after the database is created." The first part of the second sentence implies that replication could be started when the database is created – that is, in conjunction with create=true. But maybe it means only that you don't have to start it immediately after creating the database – you can replicate a database that's been around for a long time. One of the sections in the reference topics on connection attributes is "Combining with other attributes", so it would be helpful to have this information – and any other information you have on which attribute combinations are allowed and which are not. For example, I'm sure you can combine these with the username and password attributes, and I'm sure you can't specify two of the replication commands at the same time. About others, I'm not certain.)
          Hide
          Kim Haase added a comment -

          Sorry, something else in the spec is not quite clear to me.

          Is it correct that to start replication, all you need to do is specify the startMaster/slavehost/[slaveport] attributes on the master database? It appears from the description that this command gets things going on both the master and the slave without your having to explicitly run startSlave on the slave system. It appears that something like this happens when you stop the master, so I'm assuming it happens when you start it too.

          If that is the case, what is the startSlave attribute used for? Is it only used if the slave-to-master connection is lost? But in that case, according to the Failure Scenarios section, the slave automatically tries to reestablish the connection, unless I've misunderstood.

          Thanks for any clarifications.

          Show
          Kim Haase added a comment - Sorry, something else in the spec is not quite clear to me. Is it correct that to start replication, all you need to do is specify the startMaster/slavehost/ [slaveport] attributes on the master database? It appears from the description that this command gets things going on both the master and the slave without your having to explicitly run startSlave on the slave system. It appears that something like this happens when you stop the master, so I'm assuming it happens when you start it too. If that is the case, what is the startSlave attribute used for? Is it only used if the slave-to-master connection is lost? But in that case, according to the Failure Scenarios section, the slave automatically tries to reestablish the connection, unless I've misunderstood. Thanks for any clarifications.
          Hide
          Kim Haase added a comment -

          Another question: at several points the spec says that the slave "redoes" the transaction log sent by the master. What does that mean exactly? Does it perform the operations that are recorded in the transaction log? The last paragraph in "Starting and running replication" implies that, but I just wanted to be sure.

          Thanks very much.

          Show
          Kim Haase added a comment - Another question: at several points the spec says that the slave "redoes" the transaction log sent by the master. What does that mean exactly? Does it perform the operations that are recorded in the transaction log? The last paragraph in "Starting and running replication" implies that, but I just wanted to be sure. Thanks very much.
          Hide
          Jørgen Løland added a comment -

          Hi Kim,

          I'll try to sort out the ambiguities in the func spec:

          > I have a question about the startMaster attribute. Can it can be
          > used when a database is created (that is, in conjunction with
          > the create=true attribute), or must it be used after the
          > database has been created?

          In the first implementation of replication, the master database
          must be booted (also implies created) before startMaster is
          specified. This may be improved in Derby 10.5.

          > But maybe it means only that you don't have to start it
          > immediately after creating the database – you can replicate
          > a database that's been around for a long time.

          Just to be clear - you can replicate a database that's been
          around for a long time.

          > I'm sure you can combine these with the username and password
          > attributes, and I'm sure you can't specify two of the replication
          > commands at the same time. About others, I'm not certain.)

          Looking at Connection URL Attributes in the reference manual, the
          replication attributes can be combined with 'user' and 'password'
          only. Username and password attributes are required if
          authentication/authorization is turned on.

          The replication attributes start*, stop* and failover can not be
          used at the same time, whereas slavehost/slaveport can be used in
          combination with these as specified in the funcspec.

          > Is it correct that to start replication, all you need to do is
          > specify the startMaster/slavehost/[slaveport] attributes on the
          > master database? It appears from the description that this
          > command gets things going on both the master and the slave
          > without your having to explicitly run startSlave on the slave
          > system. It appears that something like this happens when you stop
          > the master, so I'm assuming it happens when you start it too.

          I can see why you would assume this, but it is not correct. I
          will update the funcspec soon to clearify this. The answer: No,
          you have to run startSlave on the slave host in addition to
          startMaster on the master host.

          Regarding lost connection: there are no commands to fix a lost
          connection between the master and slave. The replication
          functionality will try to reestablish the connection
          internally (no user interaction), but may eventually have to give
          up. If replication is stopped because of a lost connection (or
          any other reason, like issuing a stopMaster command), replication
          must be started over again from scratch.

          > Another question: at several points the spec says that the
          > slave "redoes" the transaction log sent by the master. What does
          > that mean exactly? Does it perform the operations that are
          > recorded in the transaction log?

          Correct - the slave applies the operations found in the log.

          Once again, thanks for documenting replication!

          Show
          Jørgen Løland added a comment - Hi Kim, I'll try to sort out the ambiguities in the func spec: > I have a question about the startMaster attribute. Can it can be > used when a database is created (that is, in conjunction with > the create=true attribute), or must it be used after the > database has been created? In the first implementation of replication, the master database must be booted (also implies created) before startMaster is specified. This may be improved in Derby 10.5. > But maybe it means only that you don't have to start it > immediately after creating the database – you can replicate > a database that's been around for a long time. Just to be clear - you can replicate a database that's been around for a long time. > I'm sure you can combine these with the username and password > attributes, and I'm sure you can't specify two of the replication > commands at the same time. About others, I'm not certain.) Looking at Connection URL Attributes in the reference manual, the replication attributes can be combined with 'user' and 'password' only. Username and password attributes are required if authentication/authorization is turned on. The replication attributes start*, stop* and failover can not be used at the same time, whereas slavehost/slaveport can be used in combination with these as specified in the funcspec. > Is it correct that to start replication, all you need to do is > specify the startMaster/slavehost/ [slaveport] attributes on the > master database? It appears from the description that this > command gets things going on both the master and the slave > without your having to explicitly run startSlave on the slave > system. It appears that something like this happens when you stop > the master, so I'm assuming it happens when you start it too. I can see why you would assume this, but it is not correct. I will update the funcspec soon to clearify this. The answer: No, you have to run startSlave on the slave host in addition to startMaster on the master host. Regarding lost connection: there are no commands to fix a lost connection between the master and slave. The replication functionality will try to reestablish the connection internally (no user interaction), but may eventually have to give up. If replication is stopped because of a lost connection (or any other reason, like issuing a stopMaster command), replication must be started over again from scratch. > Another question: at several points the spec says that the > slave "redoes" the transaction log sent by the master. What does > that mean exactly? Does it perform the operations that are > recorded in the transaction log? Correct - the slave applies the operations found in the log. Once again, thanks for documenting replication!
          Hide
          Kim Haase added a comment -

          Thank you so much, Jørgen, for the quick and very clear replies. That should keep me going!

          Show
          Kim Haase added a comment - Thank you so much, Jørgen, for the quick and very clear replies. That should keep me going!
          Hide
          Dibyendu Majumdar added a comment -

          Hi Jørgen,

          I would be grateful if you would let me know the answers to following:

          a) I notice that there are a couple of replication factories under org.apache.derby.iapi.services.replication.
          I would have expected that replication is a feature of the Store - and not a service. In particular, I am interested in knowing if replication has a dependency on functionality outside the Store.

          b) I also noticed some logic related to startup and shutdown of slave database in EmbedConnection. Does this mean that there is a dependency on the language layer for starting/stopping replication?

          Thanks for your help.

          Show
          Dibyendu Majumdar added a comment - Hi Jørgen, I would be grateful if you would let me know the answers to following: a) I notice that there are a couple of replication factories under org.apache.derby.iapi.services.replication. I would have expected that replication is a feature of the Store - and not a service. In particular, I am interested in knowing if replication has a dependency on functionality outside the Store. b) I also noticed some logic related to startup and shutdown of slave database in EmbedConnection. Does this mean that there is a dependency on the language layer for starting/stopping replication? Thanks for your help.
          Hide
          Jørgen Løland added a comment - - edited

          Hi Dibyendu,

          The replication factories depend on functionality in

          • store (mainly store.raw.RawStore and store.raw.log.LogToFile and LogAccessFile)
          • the database modules (BasicDatabase and SlaveDatabase).

          The replication functionality in EmbedConnection is only related to processing connection attempts with replication commands (

          {start|stop} {Master|Slave}

          and failover - see funcspec)

          In retrospect, I think o.a.d.i.store.raw.replication or maybe o.a.d.i.store.replication would be a better place for the replication factories than services.

          Hope this helps

          Show
          Jørgen Løland added a comment - - edited Hi Dibyendu, The replication factories depend on functionality in store (mainly store.raw.RawStore and store.raw.log.LogToFile and LogAccessFile) the database modules (BasicDatabase and SlaveDatabase). The replication functionality in EmbedConnection is only related to processing connection attempts with replication commands ( {start|stop} {Master|Slave} and failover - see funcspec) In retrospect, I think o.a.d.i.store.raw.replication or maybe o.a.d.i.store.replication would be a better place for the replication factories than services. Hope this helps
          Hide
          Jørgen Løland added a comment -

          Code freeze for 10.4 is getting closer by the minute. It would be good to evaluate the funcspec at this point and be explicit about which replication features will make it into this release and which will be delayed for the next. As I see it, the following will have to be delayed until 10.5:

          • CLI for NetworkServerControl (this has already been mentioned in the funcspec)
          • Depending on ETA of DERBY-2109, the system privilege for replication may or may not make it into 10.4
          • Copy the database from the master to the slave inside Derby by using the master-slave network connection. Instead, a file system copy of the database to the slave location will be required before replication can be started. This has the implication that startup of replication requires these steps:

          1) boot database 'x' on master
          2) freeze 'x' (force log and data to disk and block write operations)
          3) copy 'x' to slave location using file system copy
          4) connect to 'x' with startSlave option on Derby serving slave
          5) connect to 'x' with startMaster option on Derby serving master (includes unfreeze of 'x')

          ...as opposed to only doing the originally intended steps 4 and 5. FWIW, startup of replication in MySQL requires a similar file system copy of the database.

          Of course, both these issues can be addressed in 10.4 if someone volunteers to work on them.

          I will update the funcspec with this information in a few days unless I hear objections.

          Show
          Jørgen Løland added a comment - Code freeze for 10.4 is getting closer by the minute. It would be good to evaluate the funcspec at this point and be explicit about which replication features will make it into this release and which will be delayed for the next. As I see it, the following will have to be delayed until 10.5: CLI for NetworkServerControl (this has already been mentioned in the funcspec) Depending on ETA of DERBY-2109 , the system privilege for replication may or may not make it into 10.4 Copy the database from the master to the slave inside Derby by using the master-slave network connection. Instead, a file system copy of the database to the slave location will be required before replication can be started. This has the implication that startup of replication requires these steps: 1) boot database 'x' on master 2) freeze 'x' (force log and data to disk and block write operations) 3) copy 'x' to slave location using file system copy 4) connect to 'x' with startSlave option on Derby serving slave 5) connect to 'x' with startMaster option on Derby serving master (includes unfreeze of 'x') ...as opposed to only doing the originally intended steps 4 and 5. FWIW, startup of replication in MySQL requires a similar file system copy of the database. Of course, both these issues can be addressed in 10.4 if someone volunteers to work on them. I will update the funcspec with this information in a few days unless I hear objections.
          Hide
          Jørgen Løland added a comment -

          Attaching revised funcspec. Added section on how replication is started since the master to slave database shipping will not make it in 10.4, and some other minor changes.

          Show
          Jørgen Løland added a comment - Attaching revised funcspec. Added section on how replication is started since the master to slave database shipping will not make it in 10.4, and some other minor changes.
          Hide
          Henri van de Scheur added a comment -

          I have a few comments/questions to the Functional Specification for Derby Replication - rev. 9.0.

          Great to see a chapter 'Handling Failure Scenarios', but as stated here, there are numerous failure scenarios which can be encountered. As a follow-up, I would like to raise the following questions/comments:
          1. Would be nice to also describe a scenario where the slave Derby instance crashes
          2. Please confirm the following statement: 'Since the slave database is in recovery-mode, no data in the slave database can be changed at all by any means'. If this wouldn't be the case, a lot of consistency-problems and scenarios should be taken into account.
          3. Is the relation Master-Slave really an N:M-relation and if so, are N and/or M somehow limited?
          4. Master has still connection with Slave, but for some reason the Slave does not process the received logrecords: will this be seen as 'unexpected failure'?
          5. Network issue: what to do in case of missing packages?
          6. Network issue: will a multipath-configuration be supported? This will lead to a Slave receiving records in a wrong sequence
          7. Network issue: multiple-network-interfaces: question as for 6). In addition: special scenario(s) if one such interface is dropped?
          8. If 1 Master replicates to 2 different Slaves and one of these Slaves has a lower 'bandwidth' to receive/process the logs, will this one then determine the total (replication)transactional processes, thus also limit sending to the other Slave?
          9. If Derby instance I1 contains database D1 as Master, database D2 as Slave, instance I2 contains the same but with opposite roles: isn't there a big risk for a 'dead' race in case of problems with main memory log buffer?

          Thanks!

          Show
          Henri van de Scheur added a comment - I have a few comments/questions to the Functional Specification for Derby Replication - rev. 9.0. Great to see a chapter 'Handling Failure Scenarios', but as stated here, there are numerous failure scenarios which can be encountered. As a follow-up, I would like to raise the following questions/comments: 1. Would be nice to also describe a scenario where the slave Derby instance crashes 2. Please confirm the following statement: 'Since the slave database is in recovery-mode, no data in the slave database can be changed at all by any means'. If this wouldn't be the case, a lot of consistency-problems and scenarios should be taken into account. 3. Is the relation Master-Slave really an N:M-relation and if so, are N and/or M somehow limited? 4. Master has still connection with Slave, but for some reason the Slave does not process the received logrecords: will this be seen as 'unexpected failure'? 5. Network issue: what to do in case of missing packages? 6. Network issue: will a multipath-configuration be supported? This will lead to a Slave receiving records in a wrong sequence 7. Network issue: multiple-network-interfaces: question as for 6). In addition: special scenario(s) if one such interface is dropped? 8. If 1 Master replicates to 2 different Slaves and one of these Slaves has a lower 'bandwidth' to receive/process the logs, will this one then determine the total (replication)transactional processes, thus also limit sending to the other Slave? 9. If Derby instance I1 contains database D1 as Master, database D2 as Slave, instance I2 contains the same but with opposite roles: isn't there a big risk for a 'dead' race in case of problems with main memory log buffer? Thanks!
          Hide
          Jørgen Løland added a comment -

          Thanks for reviewing the funcspec. I'll answer your questions inline.

          > 1. Would be nice to also describe a scenario where the slave Derby instance crashes

          I agree - I'll add the following information in the next rev of the funcspec:

          If the slave crashes, the master will try to reestablish the connection (but will not make it) until the replication log buffer is full or until the master is told to stop replication (stopMaster).

          > 2. Please confirm the following statement: 'Since the slave database is in recovery-mode, no data in the slave database can be changed at all by any means'. If this wouldn't be the case, a lot of consistency-problems and scenarios should be taken into account.

          Confirmed. We do not allow any operation on the slave instance of the database, not even reads. (Although reads may me allowed in a later (post 10.4) improvement.)

          > 3. Is the relation Master-Slave really an N:M-relation and if so, are N and/or M somehow limited?

          1:1 - There will be exactly one master and one slave for each replicated database 'x'. However, one Derby instance may act as a master or a slave for many different databases.

          In the following example, there are three Derby instances managing a total of five databases. The databases are either booted the normal non-replicated way, in master mode or in slave mode. There is exactly one master and one slave per replicated db:

          Derby Inst 1 Derby Inst 2 Derby Inst 3
          Normal: 'db1'
          Master: 'db2' Slave: 'db2'
          Master: 'db3' Slave: 'db3'
          Slave: 'db4' Master: 'db4'
          Normal: 'db5'

          > 4. Master has still connection with Slave, but for some reason the Slave does not process the received logrecords: will this be seen as 'unexpected failure'?

          Yes. That would be a code bug, and considered failsafe because the slave code checks that each and every log record it applies to the local log has the exact same byte-position as it had on the master side. If a single log record is lost somewhere, the slave will notice. Of course, there could be other bugs like the slave acting as a black hole (receiving packages and replying with an ack (implicitly through TCP), but not adding the received log records to the local log). That kind of bug would not be noticed until failover

          > 5. Network issue: what to do in case of missing packages?
          > 6. Network issue: will a multipath-configuration be supported? This will lead to a Slave receiving records in a wrong sequence
          > 7. Network issue: multiple-network-interfaces: question as for 6). In addition: special scenario(s) if one such interface is dropped?

          The network communication is based on a TCP on one single socket. The connection ensures correct ordering of packages since the TCP protocol confirms that each package is delivered. A new package is not sent before the previous has been acked.

          > 8. If 1 Master replicates to 2 different Slaves and one of these Slaves has a lower 'bandwidth' to receive/process the logs, will this one then determine the total (replication)transactional processes, thus also limit sending to the other Slave?

          The 'one master, one slave' configuration prevents this.

          > 9. If Derby instance I1 contains database D1 as Master, database D2 as Slave, instance I2 contains the same but with opposite roles: isn't there a big risk for a 'dead' race in case of problems with main memory log buffer?

          The replication log buffers are not shared. Each master will have it's own main memory replication log buffer. (Could you please rephrase the question if this answer was gibberish?)

          Show
          Jørgen Løland added a comment - Thanks for reviewing the funcspec. I'll answer your questions inline. > 1. Would be nice to also describe a scenario where the slave Derby instance crashes I agree - I'll add the following information in the next rev of the funcspec: If the slave crashes, the master will try to reestablish the connection (but will not make it) until the replication log buffer is full or until the master is told to stop replication (stopMaster). > 2. Please confirm the following statement: 'Since the slave database is in recovery-mode, no data in the slave database can be changed at all by any means'. If this wouldn't be the case, a lot of consistency-problems and scenarios should be taken into account. Confirmed. We do not allow any operation on the slave instance of the database, not even reads. (Although reads may me allowed in a later (post 10.4) improvement.) > 3. Is the relation Master-Slave really an N:M-relation and if so, are N and/or M somehow limited? 1:1 - There will be exactly one master and one slave for each replicated database 'x'. However, one Derby instance may act as a master or a slave for many different databases. In the following example, there are three Derby instances managing a total of five databases. The databases are either booted the normal non-replicated way, in master mode or in slave mode. There is exactly one master and one slave per replicated db: Derby Inst 1 Derby Inst 2 Derby Inst 3 Normal: 'db1' Master: 'db2' Slave: 'db2' Master: 'db3' Slave: 'db3' Slave: 'db4' Master: 'db4' Normal: 'db5' > 4. Master has still connection with Slave, but for some reason the Slave does not process the received logrecords: will this be seen as 'unexpected failure'? Yes. That would be a code bug, and considered failsafe because the slave code checks that each and every log record it applies to the local log has the exact same byte-position as it had on the master side. If a single log record is lost somewhere, the slave will notice. Of course, there could be other bugs like the slave acting as a black hole (receiving packages and replying with an ack (implicitly through TCP), but not adding the received log records to the local log). That kind of bug would not be noticed until failover > 5. Network issue: what to do in case of missing packages? > 6. Network issue: will a multipath-configuration be supported? This will lead to a Slave receiving records in a wrong sequence > 7. Network issue: multiple-network-interfaces: question as for 6). In addition: special scenario(s) if one such interface is dropped? The network communication is based on a TCP on one single socket. The connection ensures correct ordering of packages since the TCP protocol confirms that each package is delivered. A new package is not sent before the previous has been acked. > 8. If 1 Master replicates to 2 different Slaves and one of these Slaves has a lower 'bandwidth' to receive/process the logs, will this one then determine the total (replication)transactional processes, thus also limit sending to the other Slave? The 'one master, one slave' configuration prevents this. > 9. If Derby instance I1 contains database D1 as Master, database D2 as Slave, instance I2 contains the same but with opposite roles: isn't there a big risk for a 'dead' race in case of problems with main memory log buffer? The replication log buffers are not shared. Each master will have it's own main memory replication log buffer. (Could you please rephrase the question if this answer was gibberish?)
          Hide
          Jørgen Løland added a comment -

          Retrying example in answer to Q3 - (probably not looking good, but at least readable):

          Derby Inst 1.....Derby Inst 2.....Derby Inst 3
          Normal: 'db1'
          Master: 'db2'...Slave: 'db2'
          Master: 'db3'..............................Slave: 'db3'
          Slave: 'db4'.....Master: 'db4'
          ....................................................Normal: 'db5'

          Show
          Jørgen Løland added a comment - Retrying example in answer to Q3 - (probably not looking good, but at least readable): Derby Inst 1.....Derby Inst 2.....Derby Inst 3 Normal: 'db1' Master: 'db2'...Slave: 'db2' Master: 'db3'..............................Slave: 'db3' Slave: 'db4'.....Master: 'db4' ....................................................Normal: 'db5'
          Hide
          Henri van de Scheur added a comment -

          Thanks a lot Jørgen for your quick and descriptive replies!

          I regard all of my questions answered, apart from the last one, but that was more due to a bad formulated question. I will try to explain it through an example.
          Especially the fact having '1 Master, 1 Slave', reduces a lot of possible failure scenarios and makes it a lot easier. I just hadn't interpreted the spec this way.
          Limited network communication to only 1 single socket, also limits possible failure scenarios.

          So to question 9.
          Let's say that Instance I1 is not able to send log for database D1 to the slave in Instance I2 at the required pace. I1 will then try to increase the speed of log shipment. The latter might lead to reduced capacity in receiving log from Instance I2 and database D2. This then might lead to increasing speed of log shipment from instance I2, again leading to the same symptom: reduced capacity of receiving log shipments from Instance I1, etc.....

          I hope this example makes my point more clear. But I guess response times will increase avoiding this situation.

          Show
          Henri van de Scheur added a comment - Thanks a lot Jørgen for your quick and descriptive replies! I regard all of my questions answered, apart from the last one, but that was more due to a bad formulated question. I will try to explain it through an example. Especially the fact having '1 Master, 1 Slave', reduces a lot of possible failure scenarios and makes it a lot easier. I just hadn't interpreted the spec this way. Limited network communication to only 1 single socket, also limits possible failure scenarios. So to question 9. Let's say that Instance I1 is not able to send log for database D1 to the slave in Instance I2 at the required pace. I1 will then try to increase the speed of log shipment. The latter might lead to reduced capacity in receiving log from Instance I2 and database D2. This then might lead to increasing speed of log shipment from instance I2, again leading to the same symptom: reduced capacity of receiving log shipments from Instance I1, etc..... I hope this example makes my point more clear. But I guess response times will increase avoiding this situation.
          Hide
          Jørgen Løland added a comment -

          Re Q9:
          As you indicate, this situation will eventually increase the response time for both databases. I would consider this a special case of the failure scenario described as "The master Derby instance is not able to send log to the slave at the same pace as log is generated...."

          Show
          Jørgen Løland added a comment - Re Q9: As you indicate, this situation will eventually increase the response time for both databases. I would consider this a special case of the failure scenario described as "The master Derby instance is not able to send log to the slave at the same pace as log is generated...."
          Hide
          V.Narayanan added a comment -

          I started working on a replication wiki over the weekend. I spent the weekend writing down the
          initial set of details that I thought the wiki should contain and which would be relevant to a
          new user of the replication feature. I wrote these details in the form of a text file.

          I would be converting the text file into a replication wiki over the next week.

          Since this issue deals with the replication design and tries to identify issues that are relevant
          to this feature I am attaching whatever information I have assimilated to this issue.

          Note that the attached pdf contains a very simplified block diagram of the main modules involved
          in replication. I purposely made this simplified to enable the easy understanding of the replication
          design.

          More detailed descriptions of the feature can be got by following the JIRA pointers that are mentioned
          in the explanations of the various blocks of this simplified block diagram, and going through the
          specific JIRA issues.

          I have tried to organize the information I have collected or re-collected so that it will be easy
          for reviewers to point out places in which I have gone wrong.

          I will be making further updates, improvements and modifications directly to the wiki and will not
          be attaching it to this issue.

          Show
          V.Narayanan added a comment - I started working on a replication wiki over the weekend. I spent the weekend writing down the initial set of details that I thought the wiki should contain and which would be relevant to a new user of the replication feature. I wrote these details in the form of a text file. I would be converting the text file into a replication wiki over the next week. Since this issue deals with the replication design and tries to identify issues that are relevant to this feature I am attaching whatever information I have assimilated to this issue. Note that the attached pdf contains a very simplified block diagram of the main modules involved in replication. I purposely made this simplified to enable the easy understanding of the replication design. More detailed descriptions of the feature can be got by following the JIRA pointers that are mentioned in the explanations of the various blocks of this simplified block diagram, and going through the specific JIRA issues. I have tried to organize the information I have collected or re-collected so that it will be easy for reviewers to point out places in which I have gone wrong. I will be making further updates, improvements and modifications directly to the wiki and will not be attaching it to this issue.
          Hide
          Jørgen Løland added a comment -

          Attached new funcspec (v10). It includes 'slave crashes' failure scenario, new default replication port (4851) and camel-cased slaveHost/slavePort

          Show
          Jørgen Løland added a comment - Attached new funcspec (v10). It includes 'slave crashes' failure scenario, new default replication port (4851) and camel-cased slaveHost/slavePort
          Hide
          Jørgen Løland added a comment -

          Funcspec v10: Replication system privilege not included in 10.4

          Show
          Jørgen Løland added a comment - Funcspec v10: Replication system privilege not included in 10.4
          Hide
          John H. Embretsen added a comment -

          Is there anything preventing this issue from being marked as resolved? After all, 10.4.1.3 has been released including the replication feature.

          Show
          John H. Embretsen added a comment - Is there anything preventing this issue from being marked as resolved? After all, 10.4.1.3 has been released including the replication feature.
          Hide
          Jørgen Løland added a comment -

          All subtasks completed.

          Show
          Jørgen Løland added a comment - All subtasks completed.

            People

            • Assignee:
              Jørgen Løland
              Reporter:
              Jørgen Løland
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development