Derby
  1. Derby
  2. DERBY-2872 Add Replication functionality to Derby
  3. DERBY-2921

Replication: Add a network service that connects the master and slave Derby instances

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.4.1.3
    • Component/s: Services
    • Labels:
      None

      Description

      A network connection is required between the master and slave Derby instances of a replicated database. The connection will be used to send many kinds of messages, including:

      • log records
      • the database (when replication is started)
      • master -> slave commands (like "stop replication")
      1. Replication_Network_v10.stat
        0.6 kB
        V.Narayanan
      2. Replication_Network_v10.diff
        34 kB
        V.Narayanan
      3. Replication_Network_v9.stat
        0.6 kB
        V.Narayanan
      4. Replication_Network_v9.diff
        33 kB
        V.Narayanan
      5. Replication_Network_v8.stat
        0.6 kB
        V.Narayanan
      6. Replication_Network_v8.diff
        33 kB
        V.Narayanan
      7. Replication_Network_v7.stat
        0.6 kB
        V.Narayanan
      8. Replication_Network_v7.diff
        33 kB
        V.Narayanan
      9. Replication_Network_expln_v6.txt
        3 kB
        V.Narayanan
      10. Replication_Network_v6.stat
        0.6 kB
        V.Narayanan
      11. Replication_Network_v6.diff
        33 kB
        V.Narayanan
      12. Replication_Network_v5.stat
        0.7 kB
        V.Narayanan
      13. Replication_Network_v5.diff
        39 kB
        V.Narayanan
      14. Replication_Network_v4.stat
        0.6 kB
        V.Narayanan
      15. Replication_Network_v4.diff
        34 kB
        V.Narayanan
      16. Replication_Network_v3.stat
        0.6 kB
        V.Narayanan
      17. Replication_Network_v3.diff
        32 kB
        V.Narayanan
      18. Replication_Network_v2.stat
        0.6 kB
        V.Narayanan
      19. Replication_Network_v2.diff
        26 kB
        V.Narayanan
      20. Replication_Network_v1.stat
        0.6 kB
        V.Narayanan
      21. Replication_Network_v1.diff
        24 kB
        V.Narayanan

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          Shouldn't ReplicationMessage.readExternal() throw an exception if the version is incorrect? As it is now, it checks the version and just leaves all fields as null values if it is incorrect. I think the appropriate exception is java.io.InvalidClassException (sub-class of IOException).

          I'm not sure the version check in parseInitiatorMessage() works. If the version is incorrect, ReplicationMessage.getMessage() will return null. That means this line fails with a NullPointerException:

          //Get the UID of the master
          long masterVersion = ((Long)initiatorMessage.getMessage()).longValue();

          So we never get to the point where we actually check the version.

          By the way, do we need 8 bytes (a long) to represent the version, or would a smaller data type suffice?

          Show
          Knut Anders Hatlen added a comment - Shouldn't ReplicationMessage.readExternal() throw an exception if the version is incorrect? As it is now, it checks the version and just leaves all fields as null values if it is incorrect. I think the appropriate exception is java.io.InvalidClassException (sub-class of IOException). I'm not sure the version check in parseInitiatorMessage() works. If the version is incorrect, ReplicationMessage.getMessage() will return null. That means this line fails with a NullPointerException: //Get the UID of the master long masterVersion = ((Long)initiatorMessage.getMessage()).longValue(); So we never get to the point where we actually check the version. By the way, do we need 8 bytes (a long) to represent the version, or would a smaller data type suffice?
          Hide
          Øystein Grøvlen added a comment -

          Committed patch, Replication_Network_v10.diff, with revison 577731

          Show
          Øystein Grøvlen added a comment - Committed patch, Replication_Network_v10.diff, with revison 577731
          Hide
          V.Narayanan added a comment -

          >I think it could be enlightening if it was explained in the SlaveAddress >constructor that InetAddress#getByName will return the default >(localhost) if no host name is given. Hence, no explicit handling of default >is necessary for the host address.

          done!

          Show
          V.Narayanan added a comment - >I think it could be enlightening if it was explained in the SlaveAddress >constructor that InetAddress#getByName will return the default >(localhost) if no host name is given. Hence, no explicit handling of default >is necessary for the host address. done!
          Hide
          Øystein Grøvlen added a comment -

          Thanks for the new patch. It looks very good. I will commit it later today if no problems show up when running tests.

          I have a minor nit that you may consider to include if there is a later update in this area: I think it could be enlightening if it was explained in the SlaveAddress constructor that InetAddress#getByName will return the default (localhost) if no host name is given. Hence, no explicit handling of default is necessary for the host address.

          Show
          Øystein Grøvlen added a comment - Thanks for the new patch. It looks very good. I will commit it later today if no problems show up when running tests. I have a minor nit that you may consider to include if there is a later update in this area: I think it could be enlightening if it was explained in the SlaveAddress constructor that InetAddress#getByName will return the default (localhost) if no host name is given. Hence, no explicit handling of default is necessary for the host address.
          Hide
          V.Narayanan added a comment -

          Thank you for the comments and reviews Oystein. I have addressed all the issues pointed out by you.

          For the following

          >I would like an explanation for why writeMessage does reset()
          >and flush().

          I had earlier added these methods for the following reason

          reset() - to ensure that any previously existing data is deleted before a
          new replication message is sent. Since flush() would send all the
          data from a stream the existing of previous data in a stream would
          mean a wrong data is present. Hence I thought reset() would take
          care of this.

          But I think this would be unnecessary since this situation can
          happen only if there is an error in the previous transmission and
          tcp should throw an error in this situation

          flush() - I had added to ensure that all the data is sent from the stream
          amd have added a comment for the same.

          In addition to this I have also fixed some more comments and javadoc
          that I found could be improved when I did a scan on the patch.

          I am running tests on this patch and shall revert back with results as
          soon as the tests finish.

          Thanks once again for the detailed reviews and comments.

          Show
          V.Narayanan added a comment - Thank you for the comments and reviews Oystein. I have addressed all the issues pointed out by you. For the following >I would like an explanation for why writeMessage does reset() >and flush(). I had earlier added these methods for the following reason reset() - to ensure that any previously existing data is deleted before a new replication message is sent. Since flush() would send all the data from a stream the existing of previous data in a stream would mean a wrong data is present. Hence I thought reset() would take care of this. But I think this would be unnecessary since this situation can happen only if there is an error in the previous transmission and tcp should throw an error in this situation flush() - I had added to ensure that all the data is sent from the stream amd have added a comment for the same. In addition to this I have also fixed some more comments and javadoc that I found could be improved when I did a scan on the patch. I am running tests on this patch and shall revert back with results as soon as the tests finish. Thanks once again for the detailed reviews and comments.
          Hide
          Øystein Grøvlen added a comment -

          Thanks for the new patch Narayanan, the changes look good. I have a
          few minor comments that I did not detect the previous time around:

          9. SlaveAddress: Why since it is the address that will be returned,
          why not store the address instead of the hostname?

          10. SocketConnection:

          a) socket could be local to the constructor since it is not used
          in other methods.

          b) I would like an explanation for why writeMessage does reset()
          and flush().

          c) I suggest to refer all references to replication messages in
          comments and use message objects instead.

          11. ReplicationMessageTransmit#sendInitiatorAndReceiveAck: The cast
          to ReplicationMessage is redundant.

          12. The added files no longer end with a newline.

          Show
          Øystein Grøvlen added a comment - Thanks for the new patch Narayanan, the changes look good. I have a few minor comments that I did not detect the previous time around: 9. SlaveAddress: Why since it is the address that will be returned, why not store the address instead of the hostname? 10. SocketConnection: a) socket could be local to the constructor since it is not used in other methods. b) I would like an explanation for why writeMessage does reset() and flush(). c) I suggest to refer all references to replication messages in comments and use message objects instead. 11. ReplicationMessageTransmit#sendInitiatorAndReceiveAck: The cast to ReplicationMessage is redundant. 12. The added files no longer end with a newline.
          Hide
          V.Narayanan added a comment -

          > I thought messages of type TYPE_LOG, would contain LogRecord
          > objects, not byte arrays.

          LogRecords are sent as a chunk in the form of byte arrays. In my previous patch I had changed the object to be of type LogRecord, this was wrong. I am changing it back to byte array.

          Show
          V.Narayanan added a comment - > I thought messages of type TYPE_LOG, would contain LogRecord > objects, not byte arrays. LogRecords are sent as a chunk in the form of byte arrays. In my previous patch I had changed the object to be of type LogRecord, this was wrong. I am changing it back to byte array.
          Hide
          V.Narayanan added a comment -

          I am running tests on v7 and shall revert back with the results as soon as the tests finish.

          Show
          V.Narayanan added a comment - I am running tests on v7 and shall revert back with the results as soon as the tests finish.
          Hide
          V.Narayanan added a comment -

          3 - I meant that I have added @throws tags to all the javadocs corresponding to
          the exceptions

          Show
          V.Narayanan added a comment - 3 - I meant that I have added @throws tags to all the javadocs corresponding to the exceptions
          Hide
          V.Narayanan added a comment -

          Thank you for the detailed reviews and comments Oystein!

          1 - I did a scan of all the javadoc in the current patch and have modified
          them to comply to the suggestions.

          2 - I have added @throws tag to all the exceptions.

          3 - I have changed the Initator to accept a long do the following now,

          ReplicationMessage initiatorMsg = new ReplicationMessage
          (ReplicationMessage.TYPE_INITIATE, new Long(
          ReplicationMessage.serialVersionUID));

          I have also changed the name of the method to sendInitiatorAndReceiveAck
          since it first sends a initiator messages and receives a ack corresponding
          to that.

          4 - The method names have been changed to getHostAddress() and getPortNumber().

          5 - I have made serverSocket local.

          6 - The methods in SocketConnection do not cast anymore. The casts are instead
          done in the place where these methods are used.

          7 - I have changed the methods to do the reset and the flush.

          8 - TYPE_LOG - changed to LogRecord

          TYPE_INITIATE - uses a Long now.

          VERSION_MISMATCH has been removed.

          Show
          V.Narayanan added a comment - Thank you for the detailed reviews and comments Oystein! 1 - I did a scan of all the javadoc in the current patch and have modified them to comply to the suggestions. 2 - I have added @throws tag to all the exceptions. 3 - I have changed the Initator to accept a long do the following now, ReplicationMessage initiatorMsg = new ReplicationMessage (ReplicationMessage.TYPE_INITIATE, new Long( ReplicationMessage.serialVersionUID)); I have also changed the name of the method to sendInitiatorAndReceiveAck since it first sends a initiator messages and receives a ack corresponding to that. 4 - The method names have been changed to getHostAddress() and getPortNumber(). 5 - I have made serverSocket local. 6 - The methods in SocketConnection do not cast anymore. The casts are instead done in the place where these methods are used. 7 - I have changed the methods to do the reset and the flush. 8 - TYPE_LOG - changed to LogRecord TYPE_INITIATE - uses a Long now. VERSION_MISMATCH has been removed.
          Hide
          Øystein Grøvlen added a comment -

          Patch is starting to look very good, Narayanan! Most of my comments are minor nits:

          1. The javadoc for public classes and methods should focus on how it
          is to be used, not on its implementation. Some of the javadoc for
          fields just says the obvious, instead of what it represents. (E.g.,
          "Stores an instance of the SlaveAddress class". If you know Java,
          you know this. No need to tell people that. Instead, you should
          tell that slaveAddress contains the address to the slave to
          replicate to.)

          2. In javadoc, I think @throws needs to be repeated for each
          exception.

          3. ReplicationMessageTransmit#sendAndReceiveAck: If an error is
          returned, why not use ack.getMessage() to get the SQLState?

          4. SlaveAddress: I suggest to drop 'Slave' from getSlaveHostAddress()
          and getSlavePortNumber() since that should be implicit from the
          name of the class.

          5. ReplicationMessageReceive: It seems unecessary to have serverSocket
          as a member field when it is only used by one method. I think it
          could be local to initConnection().

          6. SocketConnection: I think you should deal with Objects in this
          class and let the replication specific casting be done by the
          ReplicationMessage... classes.

          7. SocketConnection#writeMessage: The previous version did reset
          before writing and flush after. Have you decided that this is not
          necessary?

          8. ReplicationMessage:

          a) I thought messages of type TYPE_LOG, would contain LogRecord
          objects, not byte arrays.

          b) Why not use Integer or Long instead of String for the UID part
          of TYPE_INITIATE?

          c) VERSION_MISMATCH does not seem to be used.

          d) If you are able to cast the UID to int, why not store it as an
          integer instead of as a long?

          Show
          Øystein Grøvlen added a comment - Patch is starting to look very good, Narayanan! Most of my comments are minor nits: 1. The javadoc for public classes and methods should focus on how it is to be used, not on its implementation. Some of the javadoc for fields just says the obvious, instead of what it represents. (E.g., "Stores an instance of the SlaveAddress class". If you know Java, you know this. No need to tell people that. Instead, you should tell that slaveAddress contains the address to the slave to replicate to.) 2. In javadoc, I think @throws needs to be repeated for each exception. 3. ReplicationMessageTransmit#sendAndReceiveAck: If an error is returned, why not use ack.getMessage() to get the SQLState? 4. SlaveAddress: I suggest to drop 'Slave' from getSlaveHostAddress() and getSlavePortNumber() since that should be implicit from the name of the class. 5. ReplicationMessageReceive: It seems unecessary to have serverSocket as a member field when it is only used by one method. I think it could be local to initConnection(). 6. SocketConnection: I think you should deal with Objects in this class and let the replication specific casting be done by the ReplicationMessage... classes. 7. SocketConnection#writeMessage: The previous version did reset before writing and flush after. Have you decided that this is not necessary? 8. ReplicationMessage: a) I thought messages of type TYPE_LOG, would contain LogRecord objects, not byte arrays. b) Why not use Integer or Long instead of String for the UID part of TYPE_INITIATE? c) VERSION_MISMATCH does not seem to be used. d) If you are able to cast the UID to int, why not store it as an integer instead of as a long?
          Hide
          V.Narayanan added a comment -

          Pls find attached the patch contains the Network part of the replication code changed to conform to the class diagrams attached in Derby-2872. Also pls find attached a text file that explains the summary of the changes.

          I tested the network code with standalone programs for now. The following
          code snipet would start the receiver.

          ReplicationMessageReceive rec = new ReplicationMessageReceive(null, -1);
          rec.initConnection();

          and the following would start the transmitter and cause it to send the
          initiator message breaking the accept wait of the receiver.

          ReplicationMessageTransmit send = new
          ReplicationMessageTransmit(null, -1);
          send.initConnection();

          The code helps to check that the transmission and the reception of messages and the
          parsing of the initiator message is happening correctly. Pls note that in this case
          the network classes would use the default values for the host name and the port number.

          This code can be further extended to test the Receiver and the transmitter for other
          type of replication messages, but I am hoping that this exercise can wait for creation
          of further interfaces that actually use the network code.

          I am running tests on this code and shall revert back with results as soon as the tests
          complete.

          I request for this patch to be considered for reviews and comments.

          Show
          V.Narayanan added a comment - Pls find attached the patch contains the Network part of the replication code changed to conform to the class diagrams attached in Derby-2872. Also pls find attached a text file that explains the summary of the changes. I tested the network code with standalone programs for now. The following code snipet would start the receiver. ReplicationMessageReceive rec = new ReplicationMessageReceive(null, -1); rec.initConnection(); and the following would start the transmitter and cause it to send the initiator message breaking the accept wait of the receiver. ReplicationMessageTransmit send = new ReplicationMessageTransmit(null, -1); send.initConnection(); The code helps to check that the transmission and the reception of messages and the parsing of the initiator message is happening correctly. Pls note that in this case the network classes would use the default values for the host name and the port number. This code can be further extended to test the Receiver and the transmitter for other type of replication messages, but I am hoping that this exercise can wait for creation of further interfaces that actually use the network code. I am running tests on this code and shall revert back with results as soon as the tests complete. I request for this patch to be considered for reviews and comments.
          Hide
          V.Narayanan added a comment -

          >As I said, it is OK for me to commit this without tests. However, I do
          >not see that it should be very difficult to write a small test program
          >that opens a connection and sends a message using you new interfaces.

          I will factor in tests for this code. I will reattach a patch containing tests
          for this code.

          >Either you did not understand my question, or I did not understand
          >your answer. I was asking why the class javadoc only talk about
          >receiving messages from the slave, which you confirmed meant that the
          >class is only to be used by the master. So what code will the slave
          >use to listen for messages? Will there be another class for that?

          I am very sorry for this. It is me who is wrong. The slave receives messages
          from the master and it will use the ReplicationMessageReceive to receive messages
          from the master. The comment in the class
          "It receives the message from the slave" is wrong and is mis-leading.
          I will correct this. Thank you for pointing this out.

          Show
          V.Narayanan added a comment - >As I said, it is OK for me to commit this without tests. However, I do >not see that it should be very difficult to write a small test program >that opens a connection and sends a message using you new interfaces. I will factor in tests for this code. I will reattach a patch containing tests for this code. >Either you did not understand my question, or I did not understand >your answer. I was asking why the class javadoc only talk about >receiving messages from the slave, which you confirmed meant that the >class is only to be used by the master. So what code will the slave >use to listen for messages? Will there be another class for that? I am very sorry for this. It is me who is wrong. The slave receives messages from the master and it will use the ReplicationMessageReceive to receive messages from the master. The comment in the class "It receives the message from the slave" is wrong and is mis-leading. I will correct this. Thank you for pointing this out.
          Hide
          Øystein Grøvlen added a comment -

          V.Narayanan (JIRA) wrote:
          >> That is certainly OK. I think that by writing tests you get
          >> experience in using the interface you have created, and you may
          >> realize that parts of interface should be changed. Hence, it is good
          >> to write the tests before too much code has been written that rely on
          >> the current interface.
          >
          > The current code is not used in any part of the codebase for now.
          > I find it difficult to factor in tests which would use the code.
          > I humbly request that this code be allowed to remain for now and
          > promise to write comprehensive tests that shall cover all the
          > capabilities of this code once the features that use this code are
          > put in.

          As I said, it is OK for me to commit this without tests. However, I do
          not see that it should be very difficult to write a small test program
          that opens a connection and sends a message using you new interfaces.

          >> So the slave will not use this code to listen for messages?
          >
          > No the communication will only be through the streams obtained from
          > the socket after the initial connection has been established.

          Either you did not understand my question, or I did not understand
          your answer. I was asking why the class javadoc only talk about
          receiving messages from the slave, which you confirmed meant that the
          class is only to be used by the master. So what code will the slave
          use to listen for messages? Will there be another class for that?

          >> 1. ReplicationMessageTransmit and ReplicationMessageReceive has much
          >> in common, but the organization of the code differs in several
          >> places. Examples:
          >> - The ...Transmit constructor calls a private function which
          >> does the necessary work while ...Receive has much of the code
          >> in the constructor.
          >> - ...Tramsmit has a separate class for the privileged operation,
          >> ...Receive does not.
          >>
          >> I think it would be better if the code was more uniform. Maybe
          >> some of it could even be shared? (E.g., member variables and code
          >> to resolve defaults.)
          >
          > I have factored the common attributes and methods into a class
          > ClientServerSocketCommon.

          Great, I will look at your new patch.

          >
          >> 2. Only part of the code in the ReplicationMessage... classes is
          >> particular to replication. For example,
          >> ReplicationMessageTransmit#sendAndReceiveAck is the only method in
          >> that class that has any replication specific code. I think you
          >> should consider to factor out the replication specific code to
          >> separate classes that use your Transmit and Receive classes.
          >
          > The transmit class is now broken into
          >
          > ReplicationClientSocketUtil - That contains the Network related methods
          > ReplicationMessageTransmit - That handles the actual transmission of the
          > messages and the setting up of the connection.
          >
          > I could not do the same with ReplicationMessageReceive because the
          > receive class actually would be receiving on the client socket and
          > then on the Input Stream obtained from this socket. Hence now we
          > would have once class that listens on a socket upon receiving a
          > message relinquishes control to another which would handle this
          > message and again take back control from this class.
          >
          > We cannot have multi-threading here because we need the log records
          > to be written in the log file in the same order in which they are
          > received. If we use threading we would need to schedule these
          > threads appropriately which would again become cumbersome.

          I did not quite follow this, but note that separate classes does not
          necessarily mean separate threads.

          >
          >> 3. It is not clear to me what the user contract is for the
          >> ReplicationMessage... classes. I think you need to add Javadoc
          >> that describe how they are supposed to be used, and how they behave
          >> in the normal case (e.g., it needs to be mentioned that
          >> ReplicationMessageReceive#receive does not return until an error
          >> occurs) and in a failure situation (e.g., will they try to
          >> reconnect automatically?).
          >
          > I have added method specifics to the javadocs of all the related
          > methods and shall add more information if you think this is not
          > sufficient.

          Great, I will look at it, and report back.

          Show
          Øystein Grøvlen added a comment - V.Narayanan (JIRA) wrote: >> That is certainly OK. I think that by writing tests you get >> experience in using the interface you have created, and you may >> realize that parts of interface should be changed. Hence, it is good >> to write the tests before too much code has been written that rely on >> the current interface. > > The current code is not used in any part of the codebase for now. > I find it difficult to factor in tests which would use the code. > I humbly request that this code be allowed to remain for now and > promise to write comprehensive tests that shall cover all the > capabilities of this code once the features that use this code are > put in. As I said, it is OK for me to commit this without tests. However, I do not see that it should be very difficult to write a small test program that opens a connection and sends a message using you new interfaces. >> So the slave will not use this code to listen for messages? > > No the communication will only be through the streams obtained from > the socket after the initial connection has been established. Either you did not understand my question, or I did not understand your answer. I was asking why the class javadoc only talk about receiving messages from the slave, which you confirmed meant that the class is only to be used by the master. So what code will the slave use to listen for messages? Will there be another class for that? >> 1. ReplicationMessageTransmit and ReplicationMessageReceive has much >> in common, but the organization of the code differs in several >> places. Examples: >> - The ...Transmit constructor calls a private function which >> does the necessary work while ...Receive has much of the code >> in the constructor. >> - ...Tramsmit has a separate class for the privileged operation, >> ...Receive does not. >> >> I think it would be better if the code was more uniform. Maybe >> some of it could even be shared? (E.g., member variables and code >> to resolve defaults.) > > I have factored the common attributes and methods into a class > ClientServerSocketCommon. Great, I will look at your new patch. > >> 2. Only part of the code in the ReplicationMessage... classes is >> particular to replication. For example, >> ReplicationMessageTransmit#sendAndReceiveAck is the only method in >> that class that has any replication specific code. I think you >> should consider to factor out the replication specific code to >> separate classes that use your Transmit and Receive classes. > > The transmit class is now broken into > > ReplicationClientSocketUtil - That contains the Network related methods > ReplicationMessageTransmit - That handles the actual transmission of the > messages and the setting up of the connection. > > I could not do the same with ReplicationMessageReceive because the > receive class actually would be receiving on the client socket and > then on the Input Stream obtained from this socket. Hence now we > would have once class that listens on a socket upon receiving a > message relinquishes control to another which would handle this > message and again take back control from this class. > > We cannot have multi-threading here because we need the log records > to be written in the log file in the same order in which they are > received. If we use threading we would need to schedule these > threads appropriately which would again become cumbersome. I did not quite follow this, but note that separate classes does not necessarily mean separate threads. > >> 3. It is not clear to me what the user contract is for the >> ReplicationMessage... classes. I think you need to add Javadoc >> that describe how they are supposed to be used, and how they behave >> in the normal case (e.g., it needs to be mentioned that >> ReplicationMessageReceive#receive does not return until an error >> occurs) and in a failure situation (e.g., will they try to >> reconnect automatically?). > > I have added method specifics to the javadocs of all the related > methods and shall add more information if you think this is not > sufficient. Great, I will look at it, and report back.
          Hide
          V.Narayanan added a comment -

          >That is certainly OK. I think that by writing tests you get
          >experience in using the interface you have created, and you may
          >realize that parts of interface should be changed. Hence, it is good
          >to write the tests before too much code has been written that rely on
          >the current interface.

          The current code is not used in any part of the codebase for now.
          I find it difficult to factor in tests which would use the code.
          I humbly request that this code be allowed to remain for now and
          promise to write comprehensive tests that shall cover all the
          capabilities of this code once the features that use this code are
          put in.

          >Currently, you always generate a specific SQLState
          >(REPLICATION_MASTER_SLAVE_VERSION_MISMATCH) on the master when you get
          >any error on the initial message. Instead, this SQLState could be
          >sent by the slave when it detects a version mismatch.

          Modified to sending SQLState instead of generic message.

          >I would like to see the object type defined (e.g., LogRecord, String)
          >for each message type.

          done!

          Eg.

          // This flag is used to send an acknowledgment of successful completion
          // of a requested operation. It will however not be used to signify
          // reception for every message transmission. The object this message
          // type contains will be a <code>String</code>.
          public static final int TYPE_ACK = 1;

          >So the slave will not use this code to listen for messages?

          No the communication will only be through the streams obtained from
          the socket after the initial connection has been established.

          >But if there is a mismatch, the check in readExternal will throw an
          >IOException before you are able to inspect the content of the message.
          >Or have I misunderstood something?

          You are correct. I have removed the check in readExternl.

          >Looks good, but note that the getLocalHost in defaultNetworkValues is
          >not necessary anymore since AFAIK this will have already been done
          >when getByName is called by the constructor and the supplied hostName
          >is null.

          Since I now use a new class that contains all the attributes and
          initializations common to the network classes I have allowed this to
          remain in the common initializer class. I will remove this if it is felt
          that this is not OK.

          >Yes, I think the comment is a bit misleading.(related to comment no 9)

          I have removed this.

          >What about the comment in the catch block? Cannot an error also happen
          >in the initialization of ObjectOutputStream?

          fixed this comment.

          >1. ReplicationMessageTransmit and ReplicationMessageReceive has much
          > in common, but the organization of the code differs in several
          > places. Examples:
          > - The ...Transmit constructor calls a private function which
          > does the necessary work while ...Receive has much of the code
          > in the constructor.
          > - ...Tramsmit has a separate class for the privileged operation,
          > ...Receive does not.
          >
          > I think it would be better if the code was more uniform. Maybe
          > some of it could even be shared? (E.g., member variables and code
          > to resolve defaults.)

          I have factored the common attributes and methods into a class
          ClientServerSocketCommon.

          >2. Only part of the code in the ReplicationMessage... classes is
          > particular to replication. For example,
          > ReplicationMessageTransmit#sendAndReceiveAck is the only method in
          > that class that has any replication specific code. I think you
          > should consider to factor out the replication specific code to
          > separate classes that use your Transmit and Receive classes.

          The transmit class is now broken into

          ReplicationClientSocketUtil - That contains the Network related methods
          ReplicationMessageTransmit - That handles the actual transmission of the
          messages and the setting up of the connection.

          I could not do the same with ReplicationMessageReceive because the receive class
          actually would be receiving on the client socket and then on the Input Stream obtained
          from this socket. Hence now we would have once class that listens on a socket upon
          receiving a message relinquishes control to another which would handle this message
          and again take back control from this class.

          We cannot have multi-threading here because we need the log records to be written in the
          log file in the same order in which they are received. If we use threading we would need
          to schedule these threads appropriately which would again become cumbersome.

          >3. It is not clear to me what the user contract is for the
          > ReplicationMessage... classes. I think you need to add Javadoc
          > that describe how they are supposed to be used, and how they behave
          > in the normal case (e.g., it needs to be mentioned that
          > ReplicationMessageReceive#receive does not return until an error
          > occurs) and in a failure situation (e.g., will they try to
          > reconnect automatically?).

          I have added method specifics to the javadocs of all the related methods and shall add
          more information if you think this is not sufficient.

          Show
          V.Narayanan added a comment - >That is certainly OK. I think that by writing tests you get >experience in using the interface you have created, and you may >realize that parts of interface should be changed. Hence, it is good >to write the tests before too much code has been written that rely on >the current interface. The current code is not used in any part of the codebase for now. I find it difficult to factor in tests which would use the code. I humbly request that this code be allowed to remain for now and promise to write comprehensive tests that shall cover all the capabilities of this code once the features that use this code are put in. >Currently, you always generate a specific SQLState >(REPLICATION_MASTER_SLAVE_VERSION_MISMATCH) on the master when you get >any error on the initial message. Instead, this SQLState could be >sent by the slave when it detects a version mismatch. Modified to sending SQLState instead of generic message. >I would like to see the object type defined (e.g., LogRecord, String) >for each message type. done! Eg. // This flag is used to send an acknowledgment of successful completion // of a requested operation. It will however not be used to signify // reception for every message transmission. The object this message // type contains will be a <code>String</code>. public static final int TYPE_ACK = 1; >So the slave will not use this code to listen for messages? No the communication will only be through the streams obtained from the socket after the initial connection has been established. >But if there is a mismatch, the check in readExternal will throw an >IOException before you are able to inspect the content of the message. >Or have I misunderstood something? You are correct. I have removed the check in readExternl. >Looks good, but note that the getLocalHost in defaultNetworkValues is >not necessary anymore since AFAIK this will have already been done >when getByName is called by the constructor and the supplied hostName >is null. Since I now use a new class that contains all the attributes and initializations common to the network classes I have allowed this to remain in the common initializer class. I will remove this if it is felt that this is not OK. >Yes, I think the comment is a bit misleading.(related to comment no 9) I have removed this. >What about the comment in the catch block? Cannot an error also happen >in the initialization of ObjectOutputStream? fixed this comment. >1. ReplicationMessageTransmit and ReplicationMessageReceive has much > in common, but the organization of the code differs in several > places. Examples: > - The ...Transmit constructor calls a private function which > does the necessary work while ...Receive has much of the code > in the constructor. > - ...Tramsmit has a separate class for the privileged operation, > ...Receive does not. > > I think it would be better if the code was more uniform. Maybe > some of it could even be shared? (E.g., member variables and code > to resolve defaults.) I have factored the common attributes and methods into a class ClientServerSocketCommon. >2. Only part of the code in the ReplicationMessage... classes is > particular to replication. For example, > ReplicationMessageTransmit#sendAndReceiveAck is the only method in > that class that has any replication specific code. I think you > should consider to factor out the replication specific code to > separate classes that use your Transmit and Receive classes. The transmit class is now broken into ReplicationClientSocketUtil - That contains the Network related methods ReplicationMessageTransmit - That handles the actual transmission of the messages and the setting up of the connection. I could not do the same with ReplicationMessageReceive because the receive class actually would be receiving on the client socket and then on the Input Stream obtained from this socket. Hence now we would have once class that listens on a socket upon receiving a message relinquishes control to another which would handle this message and again take back control from this class. We cannot have multi-threading here because we need the log records to be written in the log file in the same order in which they are received. If we use threading we would need to schedule these threads appropriately which would again become cumbersome. >3. It is not clear to me what the user contract is for the > ReplicationMessage... classes. I think you need to add Javadoc > that describe how they are supposed to be used, and how they behave > in the normal case (e.g., it needs to be mentioned that > ReplicationMessageReceive#receive does not return until an error > occurs) and in a failure situation (e.g., will they try to > reconnect automatically?). I have added method specifics to the javadocs of all the related methods and shall add more information if you think this is not sufficient.
          Hide
          Øystein Grøvlen added a comment -

          V.Narayanan (JIRA) wrote:
          > I have addressed comments 4-19 and am working
          > on the comments 1, 2 and 3. I will address 1, 2 and 3
          > and post a follow up again.
          >

          Thanks you for your quick response to my comments. Generally, the
          changes look good. I have a few follow-up comments below.

          > -18
          >
          > * Will it be OK if I write unit test in a follow-up patch to this issue. This
          > code is not being used for now and I feel it is harmless for now.

          That is certainly OK. I think that by writing tests you get
          experience in using the interface you have created, and you may
          realize that parts of interface should be changed. Hence, it is good
          to write the tests before too much code has been written that rely on
          the current interface.

          >
          > -17
          >
          > * For now TYPE_ERROR messages are used only during the initial
          > exchange while exchanging UID versions. This did not contain a
          > SQLState to be transmitted. I think this is a good idea and I
          > will use this when a SQLException is actually thrown.

          Currently, you always generate a specific SQLState
          (REPLICATION_MASTER_SLAVE_VERSION_MISMATCH) on the master when you get
          any error on the initial message. Instead, this SQLState could be
          sent by the slave when it detects a version mismatch.

          >
          > -16
          >
          > * I take this comment as to mean that the javadoc should explicitly
          > define what kind of object needs to be transmitted as part of this
          > ReplicationMessage. I will update the javadoc to indicate what
          > type flag corresponds to what type of message. I shall also update
          > the type flag definitions to define what type of message each of
          > them signify
          >
          > For example
          >
          > public static final int TYPE_LOG = 0;
          >
          > will be modified to
          >
          > //This flag will be used for all messages will carry LogRecords.
          > public static final int TYPE_LOG = 0;

          I would like to see the object type defined (e.g., LogRecord, String)
          for each message type.

          > - 14
          >
          > * ReplicationMessageReceive javadoc: "receives the message from
          > the slave ...". Is this class only to be used by the master?
          >
          > Although the class might serve a more generic purpose after the suggested
          > factoring out of the generic network related code and the replication code
          > in the current context the comment is correct since the intention was to use
          > it only as a receiver of messages on the master.

          So the slave will not use this code to listen for messages?

          >
          > - 13
          >
          > * I planned to use the version check in
          > ReplicationMessage#readExternal at a later stage to be able to
          > handle a heterogenous collection of Derby version code.
          >
          > I did not want to do version checking here during initiation
          > because I need to be able to inform the master that it suffers
          > from an incompatibility with the slave. I do not want to abruptly
          > fail the slave without informing the master and if I start to
          > inform the master for every message received it would be too much
          > communication overload.
          >
          > I thought this initiator message would allow me to establish
          > message incompatibilities at startup and also

          But if there is a mismatch, the check in readExternal will throw an
          IOException before you are able to inspect the content of the message.
          Or have I misunderstood something?

          > - 11
          >
          > - 10
          >
          > * I have changed ReplicationMessageReceive to accept a hostName. So
          > this would take care of both 11 and 10.

          Looks good, but note that the getLocalHost in defaultNetworkValues is
          not necessary anymore since AFAIK this will have already been done
          when getByName is called by the constructor and the supplied hostName
          is null.

          >
          > - 9
          >
          > * Since the underlying code will throw this exception do you want me
          > to change this comment?

          Yes, I think the comment is a bit misleading.

          > - 7
          >
          > * Modified the javadoc to mention both ObjectInputStream and ObjectOuputStream

          What about the comment in the catch block? Cannot an error also happen
          in the initialization of ObjectOutputStream?

          Show
          Øystein Grøvlen added a comment - V.Narayanan (JIRA) wrote: > I have addressed comments 4-19 and am working > on the comments 1, 2 and 3. I will address 1, 2 and 3 > and post a follow up again. > Thanks you for your quick response to my comments. Generally, the changes look good. I have a few follow-up comments below. > -18 > > * Will it be OK if I write unit test in a follow-up patch to this issue. This > code is not being used for now and I feel it is harmless for now. That is certainly OK. I think that by writing tests you get experience in using the interface you have created, and you may realize that parts of interface should be changed. Hence, it is good to write the tests before too much code has been written that rely on the current interface. > > -17 > > * For now TYPE_ERROR messages are used only during the initial > exchange while exchanging UID versions. This did not contain a > SQLState to be transmitted. I think this is a good idea and I > will use this when a SQLException is actually thrown. Currently, you always generate a specific SQLState (REPLICATION_MASTER_SLAVE_VERSION_MISMATCH) on the master when you get any error on the initial message. Instead, this SQLState could be sent by the slave when it detects a version mismatch. > > -16 > > * I take this comment as to mean that the javadoc should explicitly > define what kind of object needs to be transmitted as part of this > ReplicationMessage. I will update the javadoc to indicate what > type flag corresponds to what type of message. I shall also update > the type flag definitions to define what type of message each of > them signify > > For example > > public static final int TYPE_LOG = 0; > > will be modified to > > //This flag will be used for all messages will carry LogRecords. > public static final int TYPE_LOG = 0; I would like to see the object type defined (e.g., LogRecord, String) for each message type. > - 14 > > * ReplicationMessageReceive javadoc: "receives the message from > the slave ...". Is this class only to be used by the master? > > Although the class might serve a more generic purpose after the suggested > factoring out of the generic network related code and the replication code > in the current context the comment is correct since the intention was to use > it only as a receiver of messages on the master. So the slave will not use this code to listen for messages? > > - 13 > > * I planned to use the version check in > ReplicationMessage#readExternal at a later stage to be able to > handle a heterogenous collection of Derby version code. > > I did not want to do version checking here during initiation > because I need to be able to inform the master that it suffers > from an incompatibility with the slave. I do not want to abruptly > fail the slave without informing the master and if I start to > inform the master for every message received it would be too much > communication overload. > > I thought this initiator message would allow me to establish > message incompatibilities at startup and also But if there is a mismatch, the check in readExternal will throw an IOException before you are able to inspect the content of the message. Or have I misunderstood something? > - 11 > > - 10 > > * I have changed ReplicationMessageReceive to accept a hostName. So > this would take care of both 11 and 10. Looks good, but note that the getLocalHost in defaultNetworkValues is not necessary anymore since AFAIK this will have already been done when getByName is called by the constructor and the supplied hostName is null. > > - 9 > > * Since the underlying code will throw this exception do you want me > to change this comment? Yes, I think the comment is a bit misleading. > - 7 > > * Modified the javadoc to mention both ObjectInputStream and ObjectOuputStream What about the comment in the catch block? Cannot an error also happen in the initialization of ObjectOutputStream?
          Hide
          V.Narayanan added a comment -

          Thank you for the detailed responses Oystein,

          I have addressed comments 4-19 and am working
          on the comments 1, 2 and 3. I will address 1, 2 and 3
          and post a follow up again.

          Some related responses

          -19

          • Removed the spaces between the names of the functions and the parenthesis.
            The space between the parenthesis and the keywords however remain.

          -18

          • Will it be OK if I write unit test in a follow-up patch to this issue. This
            code is not being used for now and I feel it is harmless for now.

          -17

          • For now TYPE_ERROR messages are used only during the initial exchange while
            exchanging UID versions. This did not contain a SQLState to be transmitted.
            I think this is a good idea and I will use this when a SQLException is actually
            thrown.

          -16

          • I take this comment as to mean that the javadoc should explicitly define what
            kind of object needs to be transmitted as part of this ReplicationMessage. I
            will update the javadoc to indicate what type flag corresponds to what type
            of message. I shall also update the type flag definitions to define what type
            of message each of them signify

          For example

          public static final int TYPE_LOG = 0;

          will be modified to

          //This flag will be used for all messages will carry LogRecords.
          public static final int TYPE_LOG = 0;

          If any other action was expected of me here I sincerely apologize and request
          further elucidation of the action suggested.

          • 15
          • I have modified the class level comment to say the following

          This message is used for the communication between the master and the
          slave during Replication.

          Please do mention if you think this is not good enough, I will change it
          again and post as per your suggestion.

          • 14
          • ReplicationMessageReceive javadoc: "receives the message from
            the slave ...". Is this class only to be used by the master?

          Although the class might serve a more generic purpose after the suggested
          factoring out of the generic network related code and the replication code
          in the current context the comment is correct since the intention was to use
          it only as a receiver of messages on the master.

          • 13
          • I planned to use the version check in ReplicationMessage#readExternal at a later
            stage to be able to handle a heterogenous collection of Derby version code.

          I did not want to do version checking here during initiation because I need to be
          able to inform the master that it suffers from an incompatibility with the slave. I
          do not want to abruptly fail the slave without informing the master and if I start
          to inform the master for every message received it would be too much communication
          overload.

          I thought this initiator message would allow me to establish message incompatibilities at
          startup and also

          • 12
          • I have changed the method to private.
          • 11
          • 10
          • I have changed ReplicationMessageReceive to accept a hostName. So this would take care of both 11 and 10.
          • 9
          • Since the underlying code will throw this exception do you want me to change this comment?
          • 8
          • Removed the local variable send.
          • 7
          • Modified the javadoc to mention both ObjectInputStream and ObjectOuputStream
          • 5
          • changed oos to objOuputStream and ois to objInputStream in both the receiver and the transmitter.
          • 4
          • Moved the declarations as suggested.
          Show
          V.Narayanan added a comment - Thank you for the detailed responses Oystein, I have addressed comments 4-19 and am working on the comments 1, 2 and 3. I will address 1, 2 and 3 and post a follow up again. Some related responses -19 Removed the spaces between the names of the functions and the parenthesis. The space between the parenthesis and the keywords however remain. -18 Will it be OK if I write unit test in a follow-up patch to this issue. This code is not being used for now and I feel it is harmless for now. -17 For now TYPE_ERROR messages are used only during the initial exchange while exchanging UID versions. This did not contain a SQLState to be transmitted. I think this is a good idea and I will use this when a SQLException is actually thrown. -16 I take this comment as to mean that the javadoc should explicitly define what kind of object needs to be transmitted as part of this ReplicationMessage. I will update the javadoc to indicate what type flag corresponds to what type of message. I shall also update the type flag definitions to define what type of message each of them signify For example public static final int TYPE_LOG = 0; will be modified to //This flag will be used for all messages will carry LogRecords. public static final int TYPE_LOG = 0; If any other action was expected of me here I sincerely apologize and request further elucidation of the action suggested. 15 I have modified the class level comment to say the following This message is used for the communication between the master and the slave during Replication. Please do mention if you think this is not good enough, I will change it again and post as per your suggestion. 14 ReplicationMessageReceive javadoc: "receives the message from the slave ...". Is this class only to be used by the master? Although the class might serve a more generic purpose after the suggested factoring out of the generic network related code and the replication code in the current context the comment is correct since the intention was to use it only as a receiver of messages on the master. 13 I planned to use the version check in ReplicationMessage#readExternal at a later stage to be able to handle a heterogenous collection of Derby version code. I did not want to do version checking here during initiation because I need to be able to inform the master that it suffers from an incompatibility with the slave. I do not want to abruptly fail the slave without informing the master and if I start to inform the master for every message received it would be too much communication overload. I thought this initiator message would allow me to establish message incompatibilities at startup and also 12 I have changed the method to private. 11 10 I have changed ReplicationMessageReceive to accept a hostName. So this would take care of both 11 and 10. 9 Since the underlying code will throw this exception do you want me to change this comment? 8 Removed the local variable send. 7 Modified the javadoc to mention both ObjectInputStream and ObjectOuputStream 5 changed oos to objOuputStream and ois to objInputStream in both the receiver and the transmitter. 4 Moved the declarations as suggested.
          Hide
          Øystein Grøvlen added a comment -

          Narayanan, thanks for the patch. Here are my comments:

          1. ReplicationMessageTransmit and ReplicationMessageReceive has much
          in common, but the organization of the code differs in several
          places. Examples:

          • The ...Transmit constructor calls a private function which
            does the necessary work while ...Receive has much of the code
            in the constructor.
          • ...Tramsmit has a separate class for the privileged operation,
            ...Receive does not.

          I think it would be better if the code was more uniform. Maybe
          some of it could even be shared? (E.g., member variables and code
          to resolve defaults.)

          2. Only part of the code in the ReplicationMessage... classes is
          particular to replication. For example,
          ReplicationMessageTransmit#sendAndReceiveAck is the only method in
          that class that has any replication specific code. I think you
          should consider to factor out the replication specific code to
          separate classes that use your Transmit and Receive classes.

          3. It is not clear to me what the user contract is for the
          ReplicationMessage... classes. I think you need to add Javadoc
          that describe how they are supposed to be used, and how they behave
          in the normal case (e.g., it needs to be mentioned that
          ReplicationMessageReceive#receive does not return until an error
          occurs) and in a failure situation (e.g., will they try to
          reconnect automatically?).

          4. I do not think you need ReplicationMessageTransmit#s. The socket
          is only referred directly in getSocketAndInputOutputStreams() and
          might as well be local to that method.

          5. I think it would be good with more meaningful names for oos and
          ois. Such names are ok for local variables, but for class members
          I think you should use something more self-explanatory.

          6. The check for default values does not have to be done on every
          connect so I suggest it can be moved to the constructor.

          7. ReplicationMessageTransmit#getSocketAndInputOutputStreams has two
          occurences of comments where only one of ObjectOutputStream and
          ObjectInputStream is mentioned, but I think both should. (javadoc
          and last comment).

          8. ReplicationMessageTransmit#send: Local variable 'answer' is not
          needed.

          9. ReplicationMessageTransmit#send: Last sentence of comment is a bit
          misleading. This code does not throw an exception if a connection
          is not established. (However, I guess the underlying code will
          throw such an exception)

          10. ReplicationMessageReceive constructor takes a host address, and
          not a host name. This means that the caller need to do the
          mapping between hostname and host address. Would it not be better
          to do this mapping within the class. That would also make the
          interfaces of ...Transmit and ...Receive more similar.

          11. ReplicationMessageReceive#createServerSocket tests whether
          hostname is null. I think it should be hostAddress.

          12. Why is ReplicationMessageReceive#blockingRead public? Can one
          call this directly instead of going through #receive?

          13. ReplicationMessageReceive#parseInitiatorMessage: Why check the
          version here? It seems to already have been done in
          ReplicationMessage#readExternal.

          14. ReplicationMessageReceive javadoc: "receives the message from
          the slave ...". Is this class only to be used by the master?

          15. ReplicationMessage javadoc: Should say that this is for
          replication.

          16. ReplicationMessage: I think it needs to be explicitly defined what
          kind of object the different types of messages will take. (e.g.,
          TYPE_LOG is followed by a log record, TYPE_INITIATE is followed
          by a string)

          17. Have you considered to use SQL states instead of general text for
          TYPE_ERROR messages?

          18. I think it would be good with a unit test that tests the basic
          capabilities of this service.

          19. Nit: The code is a bit inconsistent with respect to spaces before
          left paranthesis. Most Derby code does not use space after
          function names in declarations and calls, while space is used
          after reserved words like if, while, for etc.

          Show
          Øystein Grøvlen added a comment - Narayanan, thanks for the patch. Here are my comments: 1. ReplicationMessageTransmit and ReplicationMessageReceive has much in common, but the organization of the code differs in several places. Examples: The ...Transmit constructor calls a private function which does the necessary work while ...Receive has much of the code in the constructor. ...Tramsmit has a separate class for the privileged operation, ...Receive does not. I think it would be better if the code was more uniform. Maybe some of it could even be shared? (E.g., member variables and code to resolve defaults.) 2. Only part of the code in the ReplicationMessage... classes is particular to replication. For example, ReplicationMessageTransmit#sendAndReceiveAck is the only method in that class that has any replication specific code. I think you should consider to factor out the replication specific code to separate classes that use your Transmit and Receive classes. 3. It is not clear to me what the user contract is for the ReplicationMessage... classes. I think you need to add Javadoc that describe how they are supposed to be used, and how they behave in the normal case (e.g., it needs to be mentioned that ReplicationMessageReceive#receive does not return until an error occurs) and in a failure situation (e.g., will they try to reconnect automatically?). 4. I do not think you need ReplicationMessageTransmit#s. The socket is only referred directly in getSocketAndInputOutputStreams() and might as well be local to that method. 5. I think it would be good with more meaningful names for oos and ois. Such names are ok for local variables, but for class members I think you should use something more self-explanatory. 6. The check for default values does not have to be done on every connect so I suggest it can be moved to the constructor. 7. ReplicationMessageTransmit#getSocketAndInputOutputStreams has two occurences of comments where only one of ObjectOutputStream and ObjectInputStream is mentioned, but I think both should. (javadoc and last comment). 8. ReplicationMessageTransmit#send: Local variable 'answer' is not needed. 9. ReplicationMessageTransmit#send: Last sentence of comment is a bit misleading. This code does not throw an exception if a connection is not established. (However, I guess the underlying code will throw such an exception) 10. ReplicationMessageReceive constructor takes a host address, and not a host name. This means that the caller need to do the mapping between hostname and host address. Would it not be better to do this mapping within the class. That would also make the interfaces of ...Transmit and ...Receive more similar. 11. ReplicationMessageReceive#createServerSocket tests whether hostname is null. I think it should be hostAddress. 12. Why is ReplicationMessageReceive#blockingRead public? Can one call this directly instead of going through #receive? 13. ReplicationMessageReceive#parseInitiatorMessage: Why check the version here? It seems to already have been done in ReplicationMessage#readExternal. 14. ReplicationMessageReceive javadoc: "receives the message from the slave ...". Is this class only to be used by the master? 15. ReplicationMessage javadoc: Should say that this is for replication. 16. ReplicationMessage: I think it needs to be explicitly defined what kind of object the different types of messages will take. (e.g., TYPE_LOG is followed by a log record, TYPE_INITIATE is followed by a string) 17. Have you considered to use SQL states instead of general text for TYPE_ERROR messages? 18. I think it would be good with a unit test that tests the basic capabilities of this service. 19. Nit: The code is a bit inconsistent with respect to spaces before left paranthesis. Most Derby code does not use space after function names in declarations and calls, while space is used after reserved words like if, while, for etc.
          Hide
          V.Narayanan added a comment -

          Thank you for the comments Jorgen!

          > 2) I wonder if defaulting to localhost is a good idea -
          > replicating a db back to the same host seems to be
          > little worth. Maybe we should require the host to be
          > specified?

          I was thinking along the lines of the NetworkServer. It was defaulting to
          localhost. I agree that replicating to the same host seems little worth.
          (except while testing I guess! localhost replication would be very useful
          while writing tests that integrate into the junit tests.)

          But it seems harmless to me to allow it to default to localhost. Currently
          no message is printed though. Maybe the message addition should be done
          as part of the NetworkServer command that starts replication.

          > Further, the defaultNetworkValues method
          > in both transmitter and receiver sets both host and port to default
          >even if only one of these are not specified.

          I agree! I shall modify this.

          I shall fix 1), 3) and 4) also in the follow-up patch I will be submitting.

          Thanks once again for the comments

          Show
          V.Narayanan added a comment - Thank you for the comments Jorgen! > 2) I wonder if defaulting to localhost is a good idea - > replicating a db back to the same host seems to be > little worth. Maybe we should require the host to be > specified? I was thinking along the lines of the NetworkServer. It was defaulting to localhost. I agree that replicating to the same host seems little worth. (except while testing I guess! localhost replication would be very useful while writing tests that integrate into the junit tests.) But it seems harmless to me to allow it to default to localhost. Currently no message is printed though. Maybe the message addition should be done as part of the NetworkServer command that starts replication. > Further, the defaultNetworkValues method > in both transmitter and receiver sets both host and port to default >even if only one of these are not specified. I agree! I shall modify this. I shall fix 1), 3) and 4) also in the follow-up patch I will be submitting. Thanks once again for the comments
          Hide
          Jørgen Løland added a comment -

          Hi Narayanan

          Thanks for submitting the patch. It looks good for the most part, but I have a few nits:

          1) The AccessController.doPrivileged lines have more than 80 chars, but I notice that you have already tried to break this up by e.g. having "(Socket)" on a separate line (see line 483 in the v3 diff). Feel free to ignore this...

          2) I wonder if defaulting to localhost is a good idea - replicating a db back to the same host seems to be little worth. Maybe we should require the host to be specified? Further, the defaultNetworkValues method in both transmitter and receiver sets both host and port to default even if only one of these are not specified.

          3) I would add the following info to the javadoc somewhere, possibly in the class descriptions: If the network connection is lost for some reason, both readExternal and writeExternal will get an exception (IOException?). This means that neither of these methods will loop infinitely if something is wrong with the connection.

          4) The variable ReplicationMessage answer = null; in ReplicationMessageTransmit#send is not used

          Show
          Jørgen Løland added a comment - Hi Narayanan Thanks for submitting the patch. It looks good for the most part, but I have a few nits: 1) The AccessController.doPrivileged lines have more than 80 chars, but I notice that you have already tried to break this up by e.g. having "(Socket)" on a separate line (see line 483 in the v3 diff). Feel free to ignore this... 2) I wonder if defaulting to localhost is a good idea - replicating a db back to the same host seems to be little worth. Maybe we should require the host to be specified? Further, the defaultNetworkValues method in both transmitter and receiver sets both host and port to default even if only one of these are not specified. 3) I would add the following info to the javadoc somewhere, possibly in the class descriptions: If the network connection is lost for some reason, both readExternal and writeExternal will get an exception (IOException?). This means that neither of these methods will loop infinitely if something is wrong with the connection. 4) The variable ReplicationMessage answer = null; in ReplicationMessageTransmit#send is not used
          Hide
          V.Narayanan added a comment -

          Changes from v2 of the patch
          ----------------------------

          Introduction of a initiator message sent from the master to the
          slave.
          ---------------------------------------------------------------

          The initator contains the UID of the message that is sent
          from the master to the slave. The slave upon reception of
          the initiator message checks the UID to ensure that the
          value is equal to the UID of its message class.

          If it is not equal it throws sends a message tagged as a
          error to the slave and throws an IOException and fails to
          startup.

          If it is equal it sends a message tagged as an acknowledgment
          to the master.

          readExternal() and writeExternal() now recognize a version
          number that is marshalled between the slave and the master
          ----------------------------------------------------------

          This would be useful at a later stage when we decide that
          a master and a slave that is at different UID of messages
          can still be allowed to exchange messages with a switch on
          the version number that is passed.

          Slave now defaults to localhost and 8001
          ----------------------------------------

          If the slave is started without mentioning the hostname
          and the port number then it defaults to using localhost
          and 8001 port number. This was missed out in v2.

          Show
          V.Narayanan added a comment - Changes from v2 of the patch ---------------------------- Introduction of a initiator message sent from the master to the slave. --------------------------------------------------------------- The initator contains the UID of the message that is sent from the master to the slave. The slave upon reception of the initiator message checks the UID to ensure that the value is equal to the UID of its message class. If it is not equal it throws sends a message tagged as a error to the slave and throws an IOException and fails to startup. If it is equal it sends a message tagged as an acknowledgment to the master. readExternal() and writeExternal() now recognize a version number that is marshalled between the slave and the master ---------------------------------------------------------- This would be useful at a later stage when we decide that a master and a slave that is at different UID of messages can still be allowed to exchange messages with a switch on the version number that is passed. Slave now defaults to localhost and 8001 ---------------------------------------- If the slave is started without mentioning the hostname and the port number then it defaults to using localhost and 8001 port number. This was missed out in v2.
          Hide
          V.Narayanan added a comment -

          I have updated the patch to use Externalizable instead of Serializable. I have also hard-coded the serialVersionUID. I shall run tests on this patch and revert back. Please consider this patch for reviews and comments.

          Show
          V.Narayanan added a comment - I have updated the patch to use Externalizable instead of Serializable. I have also hard-coded the serialVersionUID. I shall run tests on this patch and revert back. Please consider this patch for reviews and comments.
          Hide
          Øystein Grøvlen added a comment -

          Rick Hillegas (JIRA) wrote:
          > Øystein> Would not that imply that as long as the format of the
          > message does not change, running different versions of the master
          > and the slave will work?
          >
          > I'm not following you. How would you handle the problem described in
          > my comment on August 8: The master hard-uprades and the
          > transactional work done as part of hard upgrade is replayed into the
          > down-rev slave database. Now the slave database thinks that it is at
          > a higher rev than the actual Derby code on the slave machine. What
          > will happen if you have to failover to this inconsistent slave?

          I agree that hard upgrade is a challenge. However, we need to
          distinguish between software version and database version here. The
          requirement will be that the database is not upgraded to a version
          that is higher than the software versions of any of the two Derby
          instances hosting the master and the slave. The master and the slave
          does not have to have the exact same version number. This also means
          that it will be possible to do an on-line upgrade of the software by
          restarting the nodes one at a time.

          It does not seem to me that it will be possible to upgrade the version
          of the database on-line. In order to upgrade, one will have to shut
          down and restart the master. During this process one can not do
          fail-over to the slave since the old master will have to remain master
          when it is booted in order to do the hard upgrade operation. This
          will create a (small) window of unavailability while the hard upgrade
          is performed.

          What I suggest is that the version of the replication protocol is
          decided by the database version. That is, before hard upgrade is
          done, new software will still use the old protocol. This means that
          if a Derby instance is master for two databases, it may use different
          different protocol versions to talk to the slaves, depending on the
          version of the database. This way, I do not think we need to be
          concerned about ReplicationClosure.

          Show
          Øystein Grøvlen added a comment - Rick Hillegas (JIRA) wrote: > Øystein> Would not that imply that as long as the format of the > message does not change, running different versions of the master > and the slave will work? > > I'm not following you. How would you handle the problem described in > my comment on August 8: The master hard-uprades and the > transactional work done as part of hard upgrade is replayed into the > down-rev slave database. Now the slave database thinks that it is at > a higher rev than the actual Derby code on the slave machine. What > will happen if you have to failover to this inconsistent slave? I agree that hard upgrade is a challenge. However, we need to distinguish between software version and database version here. The requirement will be that the database is not upgraded to a version that is higher than the software versions of any of the two Derby instances hosting the master and the slave. The master and the slave does not have to have the exact same version number. This also means that it will be possible to do an on-line upgrade of the software by restarting the nodes one at a time. It does not seem to me that it will be possible to upgrade the version of the database on-line. In order to upgrade, one will have to shut down and restart the master. During this process one can not do fail-over to the slave since the old master will have to remain master when it is booted in order to do the hard upgrade operation. This will create a (small) window of unavailability while the hard upgrade is performed. What I suggest is that the version of the replication protocol is decided by the database version. That is, before hard upgrade is done, new software will still use the old protocol. This means that if a Derby instance is master for two databases, it may use different different protocol versions to talk to the slaves, depending on the version of the database. This way, I do not think we need to be concerned about ReplicationClosure.
          Hide
          V.Narayanan added a comment -

          Thank you for the comments Rick and Oystein. I am working on converting the
          patch to use Externalizable instead of Serializable and also to hard-code the SerialVersionUID.
          I shall submit a patch with the same. I shall also update the functional specification.

          Show
          V.Narayanan added a comment - Thank you for the comments Rick and Oystein. I am working on converting the patch to use Externalizable instead of Serializable and also to hard-code the SerialVersionUID. I shall submit a patch with the same. I shall also update the functional specification.
          Hide
          Rick Hillegas added a comment -

          Øystein> If I understand correctly, hard-coding the SerialVersionUUID, will mean that when one later change the serialization form, one will have to manually update the SerialVersionUUID.

          If I were coding this myself, my instinct would be to use Externalizable rather than Serializable. As its very first task, writeExternal() would stamp a version id on the serialization stream. Symmetrically, as its very first task, readExternal() would read the stamped version id-and switch based on the actual version id. I find this approach is very flexible. And you are right, the version id has to change when the serialized form is altered-regardless of whether the solution is programmed using Serializable or Externalizable.

          Øystein> Would not that imply that as long as the format of the message does not change, running different versions of the master and the slave will work?

          I'm not following you. How would you handle the problem described in my comment on August 8: The master hard-uprades and the transactional work done as part of hard upgrade is replayed into the down-rev slave database. Now the slave database thinks that it is at a higher rev than the actual Derby code on the slave machine. What will happen if you have to failover to this inconsistent slave?

          Show
          Rick Hillegas added a comment - Øystein> If I understand correctly, hard-coding the SerialVersionUUID, will mean that when one later change the serialization form, one will have to manually update the SerialVersionUUID. If I were coding this myself, my instinct would be to use Externalizable rather than Serializable. As its very first task, writeExternal() would stamp a version id on the serialization stream. Symmetrically, as its very first task, readExternal() would read the stamped version id- and switch based on the actual version id. I find this approach is very flexible. And you are right, the version id has to change when the serialized form is altered -regardless of whether the solution is programmed using Serializable or Externalizable. Øystein> Would not that imply that as long as the format of the message does not change, running different versions of the master and the slave will work? I'm not following you. How would you handle the problem described in my comment on August 8: The master hard-uprades and the transactional work done as part of hard upgrade is replayed into the down-rev slave database. Now the slave database thinks that it is at a higher rev than the actual Derby code on the slave machine. What will happen if you have to failover to this inconsistent slave?
          Hide
          Øystein Grøvlen added a comment -

          If I understand correctly, hard-coding the SerialVersionUUID, will mean that when one later change the serialization form, one will have to manually update the SerialVersionUUID. Would not that imply that as long as the format of the message does not change, running different versions of the master and the slave will work?

          I think the suggested approach is good. It is not necessary to support protocol upgrades before an on-line upgrade is actually required. At that time, the upgrade will need to happen in two steps. First, the software is upgraded and then when all nodes are running the new version, the protocol can be upgraded. (Maybe the first step could be part of soft upgrade, and the second hard upgrade.)

          Since it is log records that are replicated, I would think that one will also have to deal with the version level of log records and data.

          Show
          Øystein Grøvlen added a comment - If I understand correctly, hard-coding the SerialVersionUUID, will mean that when one later change the serialization form, one will have to manually update the SerialVersionUUID. Would not that imply that as long as the format of the message does not change, running different versions of the master and the slave will work? I think the suggested approach is good. It is not necessary to support protocol upgrades before an on-line upgrade is actually required. At that time, the upgrade will need to happen in two steps. First, the software is upgraded and then when all nodes are running the new version, the protocol can be upgraded. (Maybe the first step could be part of soft upgrade, and the second hard upgrade.) Since it is log records that are replicated, I would think that one will also have to deal with the version level of log records and data.
          Hide
          Rick Hillegas added a comment -

          Thanks, Narayanan. I think that this solution is adequate for a first rev. The version handshake will let replication fail gracefully as necessary at upgrade time. Just to be clear, I think that the version id in question is the version of the code (the Derby jar files), not the version level of the data. If you have to get your hands on the version level of the data, then you may have to tackle some tricky soft-upgrade issues.

          Show
          Rick Hillegas added a comment - Thanks, Narayanan. I think that this solution is adequate for a first rev. The version handshake will let replication fail gracefully as necessary at upgrade time. Just to be clear, I think that the version id in question is the version of the code (the Derby jar files), not the version level of the data. If you have to get your hands on the version level of the data, then you may have to tackle some tricky soft-upgrade issues.
          Hide
          V.Narayanan added a comment -

          Thank you for looking into this issue and giving your comments Rick.
          Your comments have helped highlight critical issues that would have
          risen in the development process.

          My proposal to handle the issues pointed out are as follows.

          1) Hard-code the SerialVersionUUID. This will help overcome the problem of
          incompatibilities that arise due to difference in the machine that compiles
          the derby release.

          2) Place a restriction that all Derby instances in the Relication Closure are at the
          same rev level

          For this

          a) Exchange Derby revision levels during the handshake. If the master and the
          slave are at different rev levels replication will not be allowed.

          b) As you suggest update the functional spec to say that Replication fails if
          the master and slave are at different Derby rev levels and it is the System
          Administrator's responsibility to upgrade all Derby installations in
          the ReplicationClosure at the same time.

          This will handle the issue of incompatible changes made to ReplicationMessage and
          also the case where an upgrade is done to the master.

          These restrictions will be removed in subsequent revisions.

          I will produce an updated functional spec for this.

          Show
          V.Narayanan added a comment - Thank you for looking into this issue and giving your comments Rick. Your comments have helped highlight critical issues that would have risen in the development process. My proposal to handle the issues pointed out are as follows. 1) Hard-code the SerialVersionUUID. This will help overcome the problem of incompatibilities that arise due to difference in the machine that compiles the derby release. 2) Place a restriction that all Derby instances in the Relication Closure are at the same rev level For this a) Exchange Derby revision levels during the handshake. If the master and the slave are at different rev levels replication will not be allowed. b) As you suggest update the functional spec to say that Replication fails if the master and slave are at different Derby rev levels and it is the System Administrator's responsibility to upgrade all Derby installations in the ReplicationClosure at the same time. This will handle the issue of incompatible changes made to ReplicationMessage and also the case where an upgrade is done to the master. These restrictions will be removed in subsequent revisions. I will produce an updated functional spec for this.
          Hide
          Rick Hillegas added a comment -

          Thanks for looking into this issue, Narayanan. Some more discussion follows.

          The machine issue has to do with the machine that compiles the Derby release, not the machine which runs Derby in production. The serialVersionUUID can change from release to release even if there are no changes to replication. This depends on who steps up to be release manager and what environment they use to build the release. Of course, hard-coding the serialVersionUUID patches around this problem.

          However, hard-coding the serialVersionUUID will not solve the larger issue of how replication will behave in a mixed-release environment. Between releases, we may need to make changes to the structure of the ReplicationMessage or the log record. In my experience, version upgrade needs to be designed into the first release of a feature. It is generally a disaster when you try to retrofit upgrade into version 2 of a product.

          But there are much larger issues here than serialization. For the sake of framing this issue, I introduce the following definition:

          ReplicationClosure – This is the set of all Derby instances which are linked by replication. You compute the ReplicationClosure by starting with one of the systems, then add all systems that it replicates to and from. Then you repeat for each of those added systems, etc..

          1) Are we requiring that all Derby instances in a ReplicationClosure be at the same rev level?

          2) If not, what happens when one system in the ReplicationClosure is upgraded to a new rev of Derby independently of the others?

          3) On the other hand, if the answer to (1) is "yes", then what is our plan for propagating Derby (and application) upgrades across the ReplicationClosure?

          Consider the following problem: A 10.5 master replicates to a 10.5 slave. Later on, the 10.5 master is hard-upgraded to 10.6. Quite likely, hard-upgrade will do transactional work which must be replayed at the slave. But now, what is the state of the slave? It has a 10.5 version of Derby on its classpath, but the actual tables think they are at rev level 10.6. Fail-over is not going to work well.

          I suspect that where you want to end up is that all systems in the ReplicationClosure must be at the same Derby rev level. You need to have some plan for propagating new Derby releases across the ReplicationClosure. Maybe the answer is this: Replication fails if the master and slave are at different Derby rev levels and it is the System Administrator's responsibility to upgrade all Derby installations in the ReplicationClosure at the same time. The functional spec needs to spell out what the behavior is. Then I can reason about whether this patch fits into the spec'd model.

          Thanks.

          Show
          Rick Hillegas added a comment - Thanks for looking into this issue, Narayanan. Some more discussion follows. The machine issue has to do with the machine that compiles the Derby release, not the machine which runs Derby in production. The serialVersionUUID can change from release to release even if there are no changes to replication. This depends on who steps up to be release manager and what environment they use to build the release. Of course, hard-coding the serialVersionUUID patches around this problem. However, hard-coding the serialVersionUUID will not solve the larger issue of how replication will behave in a mixed-release environment. Between releases, we may need to make changes to the structure of the ReplicationMessage or the log record. In my experience, version upgrade needs to be designed into the first release of a feature. It is generally a disaster when you try to retrofit upgrade into version 2 of a product. But there are much larger issues here than serialization. For the sake of framing this issue, I introduce the following definition: ReplicationClosure – This is the set of all Derby instances which are linked by replication. You compute the ReplicationClosure by starting with one of the systems, then add all systems that it replicates to and from. Then you repeat for each of those added systems, etc.. 1) Are we requiring that all Derby instances in a ReplicationClosure be at the same rev level? 2) If not, what happens when one system in the ReplicationClosure is upgraded to a new rev of Derby independently of the others? 3) On the other hand, if the answer to (1) is "yes", then what is our plan for propagating Derby (and application) upgrades across the ReplicationClosure? Consider the following problem: A 10.5 master replicates to a 10.5 slave. Later on, the 10.5 master is hard-upgraded to 10.6. Quite likely, hard-upgrade will do transactional work which must be replayed at the slave. But now, what is the state of the slave? It has a 10.5 version of Derby on its classpath, but the actual tables think they are at rev level 10.6. Fail-over is not going to work well. I suspect that where you want to end up is that all systems in the ReplicationClosure must be at the same Derby rev level. You need to have some plan for propagating new Derby releases across the ReplicationClosure. Maybe the answer is this: Replication fails if the master and slave are at different Derby rev levels and it is the System Administrator's responsibility to upgrade all Derby installations in the ReplicationClosure at the same time. The functional spec needs to spell out what the behavior is. Then I can reason about whether this patch fits into the spec'd model. Thanks.
          Hide
          V.Narayanan added a comment -

          >Thanks for the patch, Naryanan.

          Thank you for taking a look at the patch Rick and giving me your comments.

          >I have an initial question about ReplicationMessage.
          >It appears to implement Serializable.

          >That means that the exchange protocol relies on both ends of the
          >connection being at the same rev level of Derby and perhaps even
          >compiled by the same machine.

          I learnt some of the nuances of ensuring proper versions for the
          serializable objects here.

          http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-control.html?page=1

          Pls find below the way I think serialization would impact this issue and a
          possible resolution.

          when replication is attempted between two different versions of Derby.
          The incompatibilities caused between releases due to serialization can
          be classified into the following

          1) Store and retrieve
          2) Mixed-release

          The store and retrieve happens when a serialized object is stored into a
          database and retrieved. This would not be relevant to us. The mixed-release
          is what is relevant to us here.

          The mixed-release occurs when the two classes on the master and the slave are
          at different value of the suid.

          "Before a serialized object is read, the ObjectInputStream class first computes the
          suid of the local class—the serializable class. It then matches this suid value
          against the one stored in the serialized object stream. When these two values match,
          ObjectInputStream reads off the fields in the stream and maps the values into the
          instantiated object. If these two values do not match, ObjectInputStream raises the
          InvalidClassException."

          An suid is calculated based on the following factors

          1) serializable class's name
          2) sorted field names
          3) method names of methods and interfaces
          4) modifiers.

          If I have got it right the the derby rev or the machine will not impact the suid.
          By machine I mean the JVM. I do not think the machine architecture would impact
          the compatibility here since the JVM would take care of that.

          The incompatibility will rather be due to class incompatibility (i.e.) the message class
          changing between versions.

          There are two things that must be done here if any changes are planned in the subsequent
          releases

          1) Override the suid.
          2) Ensure that changes in subsequent versions are compatible.

          As part of addressing the issue pointed out I intend to

          1) Override the serialVersionUID

          Doing this should should be enough for compatible changes in subsequent versions. For the
          incompatible changes a handshake should suffice but I think this can be delayed until an
          incompatible change is introduced.

          >I'm not sure this is what you want, particularly
          >since the architecture allows for large webs of Derby nodes, each of which may host
          >both slaves and masters. I think that the functional spec should address the topic
          >of version agreement between nodes. For instance, do you envision that newly upgraded
          >masters will propagate the new Derby rev to their slaves?

          Show
          V.Narayanan added a comment - >Thanks for the patch, Naryanan. Thank you for taking a look at the patch Rick and giving me your comments. >I have an initial question about ReplicationMessage. >It appears to implement Serializable. >That means that the exchange protocol relies on both ends of the >connection being at the same rev level of Derby and perhaps even >compiled by the same machine. I learnt some of the nuances of ensuring proper versions for the serializable objects here. http://www.javaworld.com/javaworld/jw-02-2006/jw-0227-control.html?page=1 Pls find below the way I think serialization would impact this issue and a possible resolution. when replication is attempted between two different versions of Derby. The incompatibilities caused between releases due to serialization can be classified into the following 1) Store and retrieve 2) Mixed-release The store and retrieve happens when a serialized object is stored into a database and retrieved. This would not be relevant to us. The mixed-release is what is relevant to us here. The mixed-release occurs when the two classes on the master and the slave are at different value of the suid . "Before a serialized object is read, the ObjectInputStream class first computes the suid of the local class—the serializable class. It then matches this suid value against the one stored in the serialized object stream. When these two values match, ObjectInputStream reads off the fields in the stream and maps the values into the instantiated object. If these two values do not match, ObjectInputStream raises the InvalidClassException." An suid is calculated based on the following factors 1) serializable class's name 2) sorted field names 3) method names of methods and interfaces 4) modifiers. If I have got it right the the derby rev or the machine will not impact the suid. By machine I mean the JVM . I do not think the machine architecture would impact the compatibility here since the JVM would take care of that. The incompatibility will rather be due to class incompatibility (i.e.) the message class changing between versions. There are two things that must be done here if any changes are planned in the subsequent releases 1) Override the suid. 2) Ensure that changes in subsequent versions are compatible. As part of addressing the issue pointed out I intend to 1) Override the serialVersionUID Doing this should should be enough for compatible changes in subsequent versions. For the incompatible changes a handshake should suffice but I think this can be delayed until an incompatible change is introduced. >I'm not sure this is what you want, particularly >since the architecture allows for large webs of Derby nodes, each of which may host >both slaves and masters. I think that the functional spec should address the topic >of version agreement between nodes. For instance, do you envision that newly upgraded >masters will propagate the new Derby rev to their slaves?
          Hide
          Rick Hillegas added a comment -

          Thanks for the patch, Naryanan. I have an initial question about ReplicationMessage. It appears to implement Serializable. That means that the exchange protocol relies on both ends of the connection being at the same rev level of Derby and perhaps even compiled by the same machine. I'm not sure this is what you want, particularly since the architecture allows for large webs of Derby nodes, each of which may host both slaves and masters. I think that the functional spec should address the topic of version agreement between nodes. For instance, do you envision that newly upgraded masters will propagate the new Derby rev to their slaves?

          Show
          Rick Hillegas added a comment - Thanks for the patch, Naryanan. I have an initial question about ReplicationMessage. It appears to implement Serializable. That means that the exchange protocol relies on both ends of the connection being at the same rev level of Derby and perhaps even compiled by the same machine. I'm not sure this is what you want, particularly since the architecture allows for large webs of Derby nodes, each of which may host both slaves and masters. I think that the functional spec should address the topic of version agreement between nodes. For instance, do you envision that newly upgraded masters will propagate the new Derby rev to their slaves?
          Hide
          V.Narayanan added a comment -

          The previous comment should have started as "Some questions for which...".
          Sorry about the typo.

          Show
          V.Narayanan added a comment - The previous comment should have started as "Some questions for which...". Sorry about the typo.
          Hide
          V.Narayanan added a comment -

          ome Questions for which I do not have an answer in this implementation

          1) Replication would by default use the port number 8001 if the user
          does not specify the port number. The number 8001 was chosen
          randomly. Is it OK to do it this way? If no how should the
          port be chosen?

          2) When a error occurs in the Transmitter or the Receiver it is
          bundled in a StandardException and thrown with a SQLState of
          XR001, XR002 etc.

          The logic behind choosing this was

          X - Means derby specific. I understood this from a scan of
          SQLState.java
          R - Replication specific.

          The last three digits would be sequential.

          Is this the correct way of assigning the SQLState?

          Is it OK to bundle the exceptions in a StandardException and
          throw them.

          3) In the receiver the accept() method on the socket is called
          from the run method.

          We cannot throw checked exceptions from a run method. This has been
          handled this by bundling the exception thrown in a RuntimeException
          and throwing it.

          The way it is done in impl/drda/ClientThread.java does not offer a
          clue as to how this is to be done as there we print console messages
          which would not be possible in this case.

          Is RuntimeException the way out here? Is there a better way of doing
          this?

          Show
          V.Narayanan added a comment - ome Questions for which I do not have an answer in this implementation 1) Replication would by default use the port number 8001 if the user does not specify the port number. The number 8001 was chosen randomly. Is it OK to do it this way? If no how should the port be chosen? 2) When a error occurs in the Transmitter or the Receiver it is bundled in a StandardException and thrown with a SQLState of XR001, XR002 etc. The logic behind choosing this was X - Means derby specific. I understood this from a scan of SQLState.java R - Replication specific. The last three digits would be sequential. Is this the correct way of assigning the SQLState? Is it OK to bundle the exceptions in a StandardException and throw them. 3) In the receiver the accept() method on the socket is called from the run method. We cannot throw checked exceptions from a run method. This has been handled this by bundling the exception thrown in a RuntimeException and throwing it. The way it is done in impl/drda/ClientThread.java does not offer a clue as to how this is to be done as there we print console messages which would not be possible in this case. Is RuntimeException the way out here? Is there a better way of doing this?
          Hide
          V.Narayanan added a comment -

          This patch contains the implementation of the Socket
          server and the socket client that will be used to
          communicate between the Master and the slave, during
          replication.

          The code will not be in derby.jar for now, but will
          be compiled in the classes directory. The class will
          be used by the replication master service which will
          contain a reference to these network classes. This will
          automatically be picked up and bundled into derby.jar once
          the master service is in place. The dependency detection
          is at present being done by a class called classlister.java.
          This class takes care of building the file derby.list that
          contains the list of all the classes that will be in derby.jar.

          Pls find below a file by file explanation of the classes that
          have been added or modified.

          A java/engine/org/apache/derby/impl/services/replication
          A java/engine/org/apache/derby/impl/services/replication/net

          The directory that contains the Network classes. As pointed
          out in Derby-2977 a Replication Master controller will be
          started as a service. This replication service and all the
          related classes will be placed in the services package under
          a sub-package called replication.

          Since this issue basically handles the network utility classes
          that will be used by the replication framework, this is placed
          under a sub-package of replication called net.

          A java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageTransmit.java

          The Transmitter is basically a client socket that takes care of sending
          the messages to the receiver. This will be used by the master to send
          log records and other messages to the slave.

          A java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageReceive.java

          The Receiver is a server socket that takes care of receiving the messages
          that are sent by the transmitter. This will be used in the slave for
          receiving the log records and other messages. This will receive the messages
          and pass them on to the other classes that will perform appropriate
          action with them.

          A java/engine/org/apache/derby/impl/services/replication/net/OpenSocketAction.java

          This class has been designed along the same lines as
          org.apache.derby.client.net.OpenSocketAction the only difference being
          that it does not have ssl enabled socket creation ability for now.
          If this is required this can be added in the same lines of the original
          class at a later stage.

          A java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessage.java

          The generic message unit that is sent between the master and the slave.

          M java/engine/org/apache/derby/loc/messages.xml
          M java/shared/org/apache/derby/shared/common/reference/SQLState.java

          The SQLState for the exceptions that are thrown when a exception
          occurs during Replication and their corresponding messages.

          Show
          V.Narayanan added a comment - This patch contains the implementation of the Socket server and the socket client that will be used to communicate between the Master and the slave, during replication. The code will not be in derby.jar for now, but will be compiled in the classes directory. The class will be used by the replication master service which will contain a reference to these network classes. This will automatically be picked up and bundled into derby.jar once the master service is in place. The dependency detection is at present being done by a class called classlister.java. This class takes care of building the file derby.list that contains the list of all the classes that will be in derby.jar. Pls find below a file by file explanation of the classes that have been added or modified. A java/engine/org/apache/derby/impl/services/replication A java/engine/org/apache/derby/impl/services/replication/net The directory that contains the Network classes. As pointed out in Derby-2977 a Replication Master controller will be started as a service. This replication service and all the related classes will be placed in the services package under a sub-package called replication. Since this issue basically handles the network utility classes that will be used by the replication framework, this is placed under a sub-package of replication called net. A java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageTransmit.java The Transmitter is basically a client socket that takes care of sending the messages to the receiver. This will be used by the master to send log records and other messages to the slave. A java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageReceive.java The Receiver is a server socket that takes care of receiving the messages that are sent by the transmitter. This will be used in the slave for receiving the log records and other messages. This will receive the messages and pass them on to the other classes that will perform appropriate action with them. A java/engine/org/apache/derby/impl/services/replication/net/OpenSocketAction.java This class has been designed along the same lines as org.apache.derby.client.net.OpenSocketAction the only difference being that it does not have ssl enabled socket creation ability for now. If this is required this can be added in the same lines of the original class at a later stage. A java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessage.java The generic message unit that is sent between the master and the slave. M java/engine/org/apache/derby/loc/messages.xml M java/shared/org/apache/derby/shared/common/reference/SQLState.java The SQLState for the exceptions that are thrown when a exception occurs during Replication and their corresponding messages.
          Hide
          V.Narayanan added a comment -

          Thank you for looking at this Dag. I shall keep this in mind while working on
          the coding for the Network Interfaces.

          Show
          V.Narayanan added a comment - Thank you for looking at this Dag. I shall keep this in mind while working on the coding for the Network Interfaces.
          Hide
          Dag H. Wanvik added a comment -

          Jørgen > I noticed that the code attached to DERBY-2852 contains a

          You may want to be careful about reusing this code until we have the
          licencing issues settled. Is there a software grant in place for this
          contribution?

          Show
          Dag H. Wanvik added a comment - Jørgen > I noticed that the code attached to DERBY-2852 contains a You may want to be careful about reusing this code until we have the licencing issues settled. Is there a software grant in place for this contribution?
          Hide
          Jørgen Løland added a comment -

          Hi Narayanan

          I noticed that the code attached to DERBY-2852 contains a network communication service. If I remember correctly, it can be used to send any kind of object by wrapping it in a NetworkPayload object. I think you should have a look at these files and see if they can be used here with some modification:

          A java/engine/org/apache/derby/impl/services/net
          A java/engine/org/apache/derby/impl/services/net/NetworkPayload.java
          A java/engine/org/apache/derby/impl/services/net/receiver/LogReceiver.java
          A java/engine/org/apache/derby/impl/services/net/shipper/LogShipper.java

          I agree that you should focus on the basic functionality for now. But SSL will very likely be required at some point, so you should definately have that in mind when working on this. DERBY-2852 does not use SSL, and I am not sure if it is easily pluggable either.

          A lot of issues regarding the network communication were discussed in DERBY-2872; make sure you are up to date on these. Especially the need for a handshake phase before the slave accepts incoming log records. It may also be the case that a master/slave connection is lost at some point in time. The network code should probably try to reconnect when this happens. Of course, the slave will have to understand that this is the same master that tries to reconnect.

          Show
          Jørgen Løland added a comment - Hi Narayanan I noticed that the code attached to DERBY-2852 contains a network communication service. If I remember correctly, it can be used to send any kind of object by wrapping it in a NetworkPayload object. I think you should have a look at these files and see if they can be used here with some modification: A java/engine/org/apache/derby/impl/services/net A java/engine/org/apache/derby/impl/services/net/NetworkPayload.java A java/engine/org/apache/derby/impl/services/net/receiver/LogReceiver.java A java/engine/org/apache/derby/impl/services/net/shipper/LogShipper.java I agree that you should focus on the basic functionality for now. But SSL will very likely be required at some point, so you should definately have that in mind when working on this. DERBY-2852 does not use SSL, and I am not sure if it is easily pluggable either. A lot of issues regarding the network communication were discussed in DERBY-2872 ; make sure you are up to date on these. Especially the need for a handshake phase before the slave accepts incoming log records. It may also be the case that a master/slave connection is lost at some point in time. The network code should probably try to reconnect when this happens. Of course, the slave will have to understand that this is the same master that tries to reconnect.
          Hide
          V.Narayanan added a comment -

          The Proof of concept attached to 2872 achieves the most basic level of tranfer that might
          be required, namely the transfer of individual log records using RMI.

          This needs to be built into a framework for generic message transfer between the master and
          the slave

          I believe lot of considerations would go into such a communication channel

          A few already pointed out by Rick in the previous discussion in Derby-2872 like handling
          Man in the middle attacks and the associated security precautions that we will take for the
          same etc.

          But as a first step I believe that we should look into building the basic communication
          system, and plan in such a way that we are able to plug in security seamlessly later, (i.e.)
          concentrate on building the functionality first ,documenting and appropriately coding for the
          enhancements that may be required later.

          Here are a few random thoughts towards this

          1) we need a socket server on the slave to listen to incoming log records and other
          information from the master
          2) The master connects to this socket server that will be running on the slave.
          3) Decide on the data structure that will be sent to this socket server and ensure that
          it is able to accomodate not only logRecords but also other messages that may be required.
          4) would SSL need to be enabled in the communication between the master and slave?
          if yes,
          Should the sockets be created the same way it is being done in
          org.apache.derby.client.net.OpenSocketAction and NetworkServerControlImpl
          supporting only SSL_OFF in the first version and the other SSL modes later?
          5) How do we write the communication data structures such that encryption
          IF NECESSARY can be plugged in at a later stage?
          6) Would we need to support any other authentication mechanishms?

          Any additional cases that need to be handled and views in the direction of this issue
          will be highly appreciated.

          Show
          V.Narayanan added a comment - The Proof of concept attached to 2872 achieves the most basic level of tranfer that might be required, namely the transfer of individual log records using RMI. This needs to be built into a framework for generic message transfer between the master and the slave I believe lot of considerations would go into such a communication channel A few already pointed out by Rick in the previous discussion in Derby-2872 like handling Man in the middle attacks and the associated security precautions that we will take for the same etc. But as a first step I believe that we should look into building the basic communication system, and plan in such a way that we are able to plug in security seamlessly later, (i.e.) concentrate on building the functionality first ,documenting and appropriately coding for the enhancements that may be required later. Here are a few random thoughts towards this 1) we need a socket server on the slave to listen to incoming log records and other information from the master 2) The master connects to this socket server that will be running on the slave. 3) Decide on the data structure that will be sent to this socket server and ensure that it is able to accomodate not only logRecords but also other messages that may be required. 4) would SSL need to be enabled in the communication between the master and slave? if yes, Should the sockets be created the same way it is being done in org.apache.derby.client.net.OpenSocketAction and NetworkServerControlImpl supporting only SSL_OFF in the first version and the other SSL modes later? 5) How do we write the communication data structures such that encryption IF NECESSARY can be plugged in at a later stage? 6) Would we need to support any other authentication mechanishms? Any additional cases that need to be handled and views in the direction of this issue will be highly appreciated.

            People

            • Assignee:
              V.Narayanan
              Reporter:
              Jørgen Løland
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development