Derby
  1. Derby
  2. DERBY-2922 Replication: Add master replication mode
  3. DERBY-3064

Implement the LogShipper that will enable the shipping of Log records from the master to the slave

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.4.1.3
    • Component/s: Miscellaneous
    • Labels:
      None
    1. LogShipperIntegration_v6.stat
      0.6 kB
      V.Narayanan
    2. LogShipperIntegration_v6.diff
      20 kB
      V.Narayanan
    3. LogShipperIntegration_v5.stat
      0.6 kB
      V.Narayanan
    4. LogShipperIntegration_v5.diff
      20 kB
      V.Narayanan
    5. LogShipperIntegration_v4.stat
      0.5 kB
      V.Narayanan
    6. LogShipperIntegration_v4.diff
      18 kB
      V.Narayanan
    7. LogShipperIntegration_v3.stat
      0.3 kB
      V.Narayanan
    8. LogShipperIntegration_v3.diff
      15 kB
      V.Narayanan
    9. LogShipperIntegration_v2.stat
      0.2 kB
      V.Narayanan
    10. LogShipperIntegration_v2.diff
      15 kB
      V.Narayanan
    11. LogShipperIntegration_v1.stat
      0.2 kB
      V.Narayanan
    12. LogShipperIntegration_v1.diff
      15 kB
      V.Narayanan
    13. LogShipperImpl_v5.stat
      0.3 kB
      V.Narayanan
    14. LogShipperImpl_v5.diff
      13 kB
      V.Narayanan
    15. LogShipperImpl_v4.stat
      0.4 kB
      V.Narayanan
    16. LogShipperImpl_v4.diff
      14 kB
      V.Narayanan
    17. LogShipperImpl_v3.stat
      0.4 kB
      V.Narayanan
    18. LogShipperImpl_v3.diff
      14 kB
      V.Narayanan
    19. LogShipperImpl_v2.stat
      0.2 kB
      V.Narayanan
    20. LogShipperImpl_v2.diff
      11 kB
      V.Narayanan
    21. LogShipperImpl_v1.stat
      0.3 kB
      V.Narayanan
    22. LogShipperImpl_v1.diff
      13 kB
      V.Narayanan

      Issue Links

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        22d 22h 32m 1 Øystein Grøvlen 03/Oct/07 10:11
        Resolved Resolved Reopened Reopened
        14d 4h 9m 1 V.Narayanan 17/Oct/07 14:21
        Reopened Reopened Resolved Resolved
        77d 20h 1 V.Narayanan 03/Jan/08 09:21
        Resolved Resolved Closed Closed
        25s 1 V.Narayanan 03/Jan/08 09:21
        Gavin made changes -
        Workflow jira [ 12412516 ] Default workflow, editable Closed status [ 12797982 ]
        Kathey Marsden made changes -
        Fix Version/s 10.4.1.3 [ 12313111 ]
        V.Narayanan made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        V.Narayanan made changes -
        Resolution Fixed [ 1 ]
        Status Reopened [ 4 ] Resolved [ 5 ]
        Hide
        V.Narayanan added a comment -

        All the patches for this issue have been committed

        Show
        V.Narayanan added a comment - All the patches for this issue have been committed
        Øystein Grøvlen made changes -
        Derby Info [Patch Available]
        Hide
        Øystein Grøvlen added a comment -

        Thanks for your newest patch, Narayanan.
        Patch, LogShipperIntegration_v6.diff, addresses all my comments.
        Committed as revision 693260.

        Show
        Øystein Grøvlen added a comment - Thanks for your newest patch, Narayanan. Patch, LogShipperIntegration_v6.diff, addresses all my comments. Committed as revision 693260.
        V.Narayanan made changes -
        Attachment LogShipperIntegration_v6.diff [ 12371416 ]
        Attachment LogShipperIntegration_v6.stat [ 12371417 ]
        Hide
        V.Narayanan added a comment -

        I have modified the patch to conform to the earlier discussion.
        Thanks once again for the reviews and comments Oystein and
        Jorgen.

        Show
        V.Narayanan added a comment - I have modified the patch to conform to the earlier discussion. Thanks once again for the reviews and comments Oystein and Jorgen.
        Hide
        Øystein Grøvlen added a comment -

        Narayanan, I agree with your analysis with respect to connection failures.

        Show
        Øystein Grøvlen added a comment - Narayanan, I agree with your analysis with respect to connection failures.
        Hide
        V.Narayanan added a comment -

        should have started the comment with "we have two cases in consideration here". Sorry about that.

        Show
        V.Narayanan added a comment - should have started the comment with "we have two cases in consideration here". Sorry about that.
        Hide
        V.Narayanan added a comment -

        1) when you attempt the first connect
        2) when a connection failure occurs

        1) I believe that if an IOException occurs here we should log the exception and exit.
        User can handle the cause of the exception here and can restart replication after
        correcting the problem. We do not know the severity of the cause of the IOException.
        So it is very difficult to distinguish between when we should stop and re-try based
        on the cause. So I believe here we should exit when a timeout occurs.

        2) Here we should re-try if a timeout or a IOException occurs. This would take care of
        the case for example when the network connection goes down and comes back again. Also
        if the log buffer is full and the network is down we fail aborting reconnect attempts.

        Does this sound OK to you?

        Show
        V.Narayanan added a comment - 1) when you attempt the first connect 2) when a connection failure occurs 1) I believe that if an IOException occurs here we should log the exception and exit. User can handle the cause of the exception here and can restart replication after correcting the problem. We do not know the severity of the cause of the IOException. So it is very difficult to distinguish between when we should stop and re-try based on the cause. So I believe here we should exit when a timeout occurs. 2) Here we should re-try if a timeout or a IOException occurs. This would take care of the case for example when the network connection goes down and comes back again. Also if the log buffer is full and the network is down we fail aborting reconnect attempts. Does this sound OK to you?
        Hide
        Øystein Grøvlen added a comment -

        I must admit, I do not have deep knowledge about Socket communication
        in Java. If specifying timeout on connect, what type of errors will
        cause connect to throw an exception before it times out? Will all
        such errors mean that a retry will probably also fail? With the new
        patch the master seems to give up on the first attempt. Depending on
        the answer to my question above, this may be OK.

        I thought the idea with respect to network failures during replication
        was to keep on trying until the log buffer is full. However, with the
        new patch it seems to be timeout-based with the same timeout as for
        start of replication.

        Minor: Some white-space has been added to previously empty lines.

        Show
        Øystein Grøvlen added a comment - I must admit, I do not have deep knowledge about Socket communication in Java. If specifying timeout on connect, what type of errors will cause connect to throw an exception before it times out? Will all such errors mean that a retry will probably also fail? With the new patch the master seems to give up on the first attempt. Depending on the answer to my question above, this may be OK. I thought the idea with respect to network failures during replication was to keep on trying until the log buffer is full. However, with the new patch it seems to be timeout-based with the same timeout as for start of replication. Minor: Some white-space has been added to previously empty lines.
        V.Narayanan made changes -
        Attachment LogShipperIntegration_v5.stat [ 12371225 ]
        Attachment LogShipperIntegration_v5.diff [ 12371224 ]
        Hide
        V.Narayanan added a comment -

        >9. a) Will a timeout at this level work? As far as I can tell you
        > have not set any timeout on the socket. Is there a default
        > timeout, or do you risk that getting the connection may hang for
        > much longer than the desired timeout?

        > b) Don't you risk that the same error message may be written to
        > derby.log very many times before the timeout occurs? An
        > alternative could be to link the last received IOExcpetion to
        > the exception thrown when it times out.

        I have changed the patch to use socket timeout. I have removed the earlier timeouts used at
        this level. IOExceptions are now being thrown, since overshooting the time interval would now
        throw a SocketTimeoutException rather than a IOException.

        >MasterFactory: Is SLAVE_DB a good name here? It seems to me that we
        >are talking about the database at the master here.

        Changed it to MASTER_DB

        Show
        V.Narayanan added a comment - >9. a) Will a timeout at this level work? As far as I can tell you > have not set any timeout on the socket. Is there a default > timeout, or do you risk that getting the connection may hang for > much longer than the desired timeout? > b) Don't you risk that the same error message may be written to > derby.log very many times before the timeout occurs? An > alternative could be to link the last received IOExcpetion to > the exception thrown when it times out. I have changed the patch to use socket timeout. I have removed the earlier timeouts used at this level. IOExceptions are now being thrown, since overshooting the time interval would now throw a SocketTimeoutException rather than a IOException. >MasterFactory: Is SLAVE_DB a good name here? It seems to me that we >are talking about the database at the master here. Changed it to MASTER_DB
        Hide
        Øystein Grøvlen added a comment -

        Thanks for the new patch Narayanan and for fixing all the minor nits I
        pointed out. I have some follow-up with respect to the initial
        connection:

        9. a) Will a timeout at this level work? As far as I can tell you
        have not set any timeout on the socket. Is there a default
        timeout, or do you risk that getting the connection may hang for
        much longer than the desired timeout?

        b) Don't you risk that the same error message may be written to
        derby.log very many times before the timeout occurs? An
        alternative could be to link the last received IOExcpetion to
        the exception thrown when it times out.

        MasterFactory: Is SLAVE_DB a good name here? It seems to me that we
        are talking about the database at the master here.

        Show
        Øystein Grøvlen added a comment - Thanks for the new patch Narayanan and for fixing all the minor nits I pointed out. I have some follow-up with respect to the initial connection: 9. a) Will a timeout at this level work? As far as I can tell you have not set any timeout on the socket. Is there a default timeout, or do you risk that getting the connection may hang for much longer than the desired timeout? b) Don't you risk that the same error message may be written to derby.log very many times before the timeout occurs? An alternative could be to link the last received IOExcpetion to the exception thrown when it times out. MasterFactory: Is SLAVE_DB a good name here? It seems to me that we are talking about the database at the master here.
        V.Narayanan made changes -
        Attachment LogShipperIntegration_v4.diff [ 12370984 ]
        Attachment LogShipperIntegration_v4.stat [ 12370985 ]
        Hide
        V.Narayanan added a comment -

        Thank you for the detailed comments Oystein.

        >9. MasterController:

        > a) I think startMaster() needs to have a timeout for how long it
        > should wait before it tells the caller that it was not able to
        > connect to the slave.

        I have addressed this issue by using a timeout of 5 seconds for now (This
        will be changed to be configurable in a later patch). I have left the
        sleep period unchanged for now. One issue that remains is what happens if
        the timeout is lesser than the sleep period.

        Would this mean that we should derive the sleep period from the timeout?

        For example we could assume that we will always have 5 re-connect attempts and
        divide the timeout set by the user by 5 and use that as the sleep intervals.

        > b) setupConnection(): I am not sure, but will it always be OK to
        > not to propagate the content of IOExcpetion? Will there never
        > be any interesting info that could help a user diagnose why a
        > connection is not established?

        since we attempt a re-connect when this method returns false is doing this

        logError(MessageId.REPLICATION_MASTER_RECONN, ioe, dbname);

        before returning false OK ?

        > c) Unused imports: PrivilegedActionException, ErrorStringBuilder

        fixed

        > d) Class declaration: Non-intuitive breakage of lines for
        > implements/extends.

        moved the implements keyword to the next line along with the the interfaces
        being implemented.

        > e) Unnecessary white-space diff after declaration of
        > DEFAULT_LOG_BUFFER_SIZE.

        fixed

        > f) I agree that setUpConnection should be moved, but it is still in
        > the wrong place (under methods from ModuleSupportable interface)

        moved it to the end of the class along with handleException method.

        > g) I think the JavaDoc for handleExceptions should say that it only
        > handles IOException and StandardException.

        changed the javadoc to

        " Used to handle the exceptions (IOException and StandardException) from
        the log shipper."

        >10. AsynchronousLogShipper

        > a) I do not understand why you let the LogShipper continue after it
        > is has been interrupted. It seems like interrupt is only used
        > when the thread is to be terminated. Maybe you should just
        > terminate the thread here.

        The method returns upon an interrupted exception now.

        > b) What is the purpose of stopLogShipment()? It does not seem to
        > stop the LogShipper from sending log anymore. I do not
        > understand why you have changed run() to 'while(true)' instead
        > of 'while(!stopShipping)'.

        changed while(true) to while(!stopShipping). stopShipping was being used
        by forceFlush earlier.

        >11. messages.xml

        > a) I think messages needs to have a parameter for database name
        > (slave messages does). If the master is replication several
        > databases, the users needs to be told which one has terminated.

        fixed

        > b) Last message name should be R010, not R008.

        fixed

        Thanks once again for the comments. I really appreciate your detailed review.

        Show
        V.Narayanan added a comment - Thank you for the detailed comments Oystein. >9. MasterController: > a) I think startMaster() needs to have a timeout for how long it > should wait before it tells the caller that it was not able to > connect to the slave. I have addressed this issue by using a timeout of 5 seconds for now (This will be changed to be configurable in a later patch). I have left the sleep period unchanged for now. One issue that remains is what happens if the timeout is lesser than the sleep period. Would this mean that we should derive the sleep period from the timeout? For example we could assume that we will always have 5 re-connect attempts and divide the timeout set by the user by 5 and use that as the sleep intervals. > b) setupConnection(): I am not sure, but will it always be OK to > not to propagate the content of IOExcpetion? Will there never > be any interesting info that could help a user diagnose why a > connection is not established? since we attempt a re-connect when this method returns false is doing this logError(MessageId.REPLICATION_MASTER_RECONN, ioe, dbname); before returning false OK ? > c) Unused imports: PrivilegedActionException, ErrorStringBuilder fixed > d) Class declaration: Non-intuitive breakage of lines for > implements/extends. moved the implements keyword to the next line along with the the interfaces being implemented. > e) Unnecessary white-space diff after declaration of > DEFAULT_LOG_BUFFER_SIZE. fixed > f) I agree that setUpConnection should be moved, but it is still in > the wrong place (under methods from ModuleSupportable interface) moved it to the end of the class along with handleException method. > g) I think the JavaDoc for handleExceptions should say that it only > handles IOException and StandardException. changed the javadoc to " Used to handle the exceptions (IOException and StandardException) from the log shipper." >10. AsynchronousLogShipper > a) I do not understand why you let the LogShipper continue after it > is has been interrupted. It seems like interrupt is only used > when the thread is to be terminated. Maybe you should just > terminate the thread here. The method returns upon an interrupted exception now. > b) What is the purpose of stopLogShipment()? It does not seem to > stop the LogShipper from sending log anymore. I do not > understand why you have changed run() to 'while(true)' instead > of 'while(!stopShipping)'. changed while(true) to while(!stopShipping). stopShipping was being used by forceFlush earlier. >11. messages.xml > a) I think messages needs to have a parameter for database name > (slave messages does). If the master is replication several > databases, the users needs to be told which one has terminated. fixed > b) Last message name should be R010, not R008. fixed Thanks once again for the comments. I really appreciate your detailed review.
        Hide
        Øystein Grøvlen added a comment -

        Thanks, for the patch Narayanan. Most of my comments are just
        cosmetical issues:

        9. MasterController:

        a) I think startMaster() needs to have a timeout for how long it
        should wait before it tells the caller that it was not able to
        connect to the slave.

        b) setupConnection(): I am not sure, but will it always be OK to
        not to propagate the content of IOExcpetion? Will there never
        be any interesting info that could help a user diagnose why a
        connection is not established?

        c) Unused imports: PrivilegedActionException, ErrorStringBuilder

        d) Class declaration: Non-intuitive breakage of lines for
        implements/extends.

        e) Unnecessary white-space diff after declaration of
        DEFAULT_LOG_BUFFER_SIZE.

        f) I agree that setUpConnection should be moved, but it is still in
        the wrong place (under methods from ModuleSupportable interface)

        g) I think the JavaDoc for handleExceptions should say that it only
        handles IOException and StandardException.

        10. AsynchronousLogShipper

        a) I do not understand why you let the LogShipper continue after it
        is has been interrupted. It seems like interrupt is only used
        when the thread is to be terminated. Maybe you should just
        terminate the thread here.

        b) What is the purpose of stopLogShipment()? It does not seem to
        stop the LogShipper from sending log anymore. I do not
        understand why you have changed run() to 'while(true)' instead
        of 'while(!stopShipping)'.

        11. messages.xml

        a) I think messages needs to have a parameter for database name
        (slave messages does). If the master is replication several
        databases, the users needs to be told which one has terminated.

        b) Last message name should be R010, not R008.

        Show
        Øystein Grøvlen added a comment - Thanks, for the patch Narayanan. Most of my comments are just cosmetical issues: 9. MasterController: a) I think startMaster() needs to have a timeout for how long it should wait before it tells the caller that it was not able to connect to the slave. b) setupConnection(): I am not sure, but will it always be OK to not to propagate the content of IOExcpetion? Will there never be any interesting info that could help a user diagnose why a connection is not established? c) Unused imports: PrivilegedActionException, ErrorStringBuilder d) Class declaration: Non-intuitive breakage of lines for implements/extends. e) Unnecessary white-space diff after declaration of DEFAULT_LOG_BUFFER_SIZE. f) I agree that setUpConnection should be moved, but it is still in the wrong place (under methods from ModuleSupportable interface) g) I think the JavaDoc for handleExceptions should say that it only handles IOException and StandardException. 10. AsynchronousLogShipper a) I do not understand why you let the LogShipper continue after it is has been interrupted. It seems like interrupt is only used when the thread is to be terminated. Maybe you should just terminate the thread here. b) What is the purpose of stopLogShipment()? It does not seem to stop the LogShipper from sending log anymore. I do not understand why you have changed run() to 'while(true)' instead of 'while(!stopShipping)'. 11. messages.xml a) I think messages needs to have a parameter for database name (slave messages does). If the master is replication several databases, the users needs to be told which one has terminated. b) Last message name should be R010, not R008.
        V.Narayanan made changes -
        Derby Info [Patch Available]
        V.Narayanan made changes -
        Link This issue blocks DERBY-3235 [ DERBY-3235 ]
        V.Narayanan made changes -
        Attachment LogShipperIntegration_v3.diff [ 12370081 ]
        Attachment LogShipperIntegration_v3.stat [ 12370082 ]
        Hide
        V.Narayanan added a comment -

        In this patch

        • I have addressed the issues pointed out with respect to logging
          in the MasterController.
        • I have followed the same pattern as in SlaveController in extending
          the ReplicationLogger and using the logError method. I however thought
          it would have been better to have had logError as a static method in the
          ReplicationLogger class rather than extending the class into the controllers.
          I decided not to disturb the structure of the classes for now.

        I have not run tests on the patch. I shall run the tests and shall revert back.

        Show
        V.Narayanan added a comment - In this patch I have addressed the issues pointed out with respect to logging in the MasterController. I have followed the same pattern as in SlaveController in extending the ReplicationLogger and using the logError method. I however thought it would have been better to have had logError as a static method in the ReplicationLogger class rather than extending the class into the controllers. I decided not to disturb the structure of the classes for now. I have not run tests on the patch. I shall run the tests and shall revert back.
        Hide
        Knut Anders Hatlen added a comment -

        Sounds like a good plan, Jørgen. I guess it would be best if the default is to log all exceptions that are not observable any other way (derby.replication.verbose=true).

        Show
        Knut Anders Hatlen added a comment - Sounds like a good plan, Jørgen. I guess it would be best if the default is to log all exceptions that are not observable any other way (derby.replication.verbose=true).
        Hide
        Jørgen Løland added a comment -

        > (snip) And exceptions are automatically logged in accordance with derby.stream.error.logSeverityLevel anyway, aren't they? If the user has requested no error logging, I'm not sure if we should ignore that request. (If the user is never going to see the exception, I think it's a good idea to log it, but in startMaster() the exception will propagate out to the user, if I understand correctly.)

        Knut,
        You are right about startMaster - exceptions thrown from this method would propagate back to the user. Apart from that single method, however, the methods in MasterController are called by either

        1. a user transaction thread executing normal transactions, or
        2. the checkpoint daemon, or
        3. the log shipper thread

        Thus, we need a plan for what to do with exceptions from these methods. Even though it would be possible to propagate a (wrapped) ReplicationLogBufferFull exception back to a random user who is executing a transaction (scenario 1.), I'm not sure that's the right address for these messages. That's why exceptions are currently not thrown to the callers of the MasterFactory (interface of MC) methods.

        Would it suffice to use MC#printErrorStack for exceptions in all methods except startMaster, and let MC#printErrorStack print error messages based on the value of a property, e.g. "derby.replication.verbose"? The property could, e.g., correspond to the same logSeverityLevel used for other exceptions. That way, the amount of information written to derby.log can be controlled.

        If this sounds like a good idea, I will add the aforementioned property in a SlaveController patch that will make use of the functionality added in Derby-3071.

        Show
        Jørgen Løland added a comment - > (snip) And exceptions are automatically logged in accordance with derby.stream.error.logSeverityLevel anyway, aren't they? If the user has requested no error logging, I'm not sure if we should ignore that request. (If the user is never going to see the exception, I think it's a good idea to log it, but in startMaster() the exception will propagate out to the user, if I understand correctly.) Knut, You are right about startMaster - exceptions thrown from this method would propagate back to the user. Apart from that single method, however, the methods in MasterController are called by either 1. a user transaction thread executing normal transactions, or 2. the checkpoint daemon, or 3. the log shipper thread Thus, we need a plan for what to do with exceptions from these methods. Even though it would be possible to propagate a (wrapped) ReplicationLogBufferFull exception back to a random user who is executing a transaction (scenario 1.), I'm not sure that's the right address for these messages. That's why exceptions are currently not thrown to the callers of the MasterFactory (interface of MC) methods. Would it suffice to use MC#printErrorStack for exceptions in all methods except startMaster, and let MC#printErrorStack print error messages based on the value of a property, e.g. "derby.replication.verbose"? The property could, e.g., correspond to the same logSeverityLevel used for other exceptions. That way, the amount of information written to derby.log can be controlled. If this sounds like a good idea, I will add the aforementioned property in a SlaveController patch that will make use of the functionality added in Derby-3071.
        Mike Matrigali made changes -
        Component/s Miscellaneous [ 11400 ]
        Hide
        Knut Anders Hatlen added a comment -

        > I have overcome this by doing
        >
        > failedChunk = (mesg==null) ? failedChunk : mesg;

        I think this would be clearer if written as

        if (mesg != null}

        { failedChunk = mesg; }

        Then it's clearer that nothing happens if mesg==null.

        > I have interrupted the log shipper thread and consequently removed
        > the stopShipping check from the run() method. I instead have
        > introduced a while(true) loop. I have however retained the
        > stopShipping check inorder to ensure that a forceFlush call after
        > stopreplication does not succeed.

        I don't think you can remove the stopShipping check just because the
        thread is interrupted. An interrupt does not stop the thread, but
        blocking I/O operations will fail, and calls to sleep() or wait()
        throw InterruptedException. As far as I can see, run() still silently
        ignores InterruptedException, so the loop will just continue.

        Anyway, I'm not sure we should rely on Thread.interrupt() to stop a
        thread since it is hard to predict the state of the thread after it
        has been interrupted. In general, I think wait()/notify() is a better
        mechanism than sleep()/interrupt().

        > b) InterruptedExceptions are normally trapped since they occur when
        > some code intentionally interrupts the thread.

        I think the normal way to handle InterruptedException in Derby is

        catch (InterruptedException ie)

        { throw StandardException.interrupt(ie); }

        > I initially thought of reporting IOExceptions here, but IOExceptions
        > cause setupConnection to return false and a next reconnect is
        > attempted in half a second. Reporting this failure in log would
        > cause a IOException to be printed in the log every half a
        > second. Would'nt this amount to flooding the log? All these
        > IOEXceptions being registered would contain the same information
        > stating a socket connection exception. So I was afraid that it would
        > contain the same information being repeated too many times.

        If this is always a socket connection exception, could the catch
        clause be narrowed down to java.net.ConnectException instead of
        java.io.Exception?

        Also, should we give up and fail gracefully after a certain number of
        retries?

        Show
        Knut Anders Hatlen added a comment - > I have overcome this by doing > > failedChunk = (mesg==null) ? failedChunk : mesg; I think this would be clearer if written as if (mesg != null} { failedChunk = mesg; } Then it's clearer that nothing happens if mesg==null. > I have interrupted the log shipper thread and consequently removed > the stopShipping check from the run() method. I instead have > introduced a while(true) loop. I have however retained the > stopShipping check inorder to ensure that a forceFlush call after > stopreplication does not succeed. I don't think you can remove the stopShipping check just because the thread is interrupted. An interrupt does not stop the thread, but blocking I/O operations will fail, and calls to sleep() or wait() throw InterruptedException. As far as I can see, run() still silently ignores InterruptedException, so the loop will just continue. Anyway, I'm not sure we should rely on Thread.interrupt() to stop a thread since it is hard to predict the state of the thread after it has been interrupted. In general, I think wait()/notify() is a better mechanism than sleep()/interrupt(). > b) InterruptedExceptions are normally trapped since they occur when > some code intentionally interrupts the thread. I think the normal way to handle InterruptedException in Derby is catch (InterruptedException ie) { throw StandardException.interrupt(ie); } > I initially thought of reporting IOExceptions here, but IOExceptions > cause setupConnection to return false and a next reconnect is > attempted in half a second. Reporting this failure in log would > cause a IOException to be printed in the log every half a > second. Would'nt this amount to flooding the log? All these > IOEXceptions being registered would contain the same information > stating a socket connection exception. So I was afraid that it would > contain the same information being repeated too many times. If this is always a socket connection exception, could the catch clause be narrowed down to java.net.ConnectException instead of java.io.Exception? Also, should we give up and fail gracefully after a certain number of retries?
        Hide
        Knut Anders Hatlen added a comment -

        Do we need try/catch here? (in MasterController.startMaster)

        + try

        { + logFactory.startReplicationMasterRole(this); + }

        catch (StandardException se)

        { + Monitor.logMessage(REPLICATION_MESSAGE_HEADER + + "Exception occurred while trying to start replication" + + " Master role."); + printErrorStack(se); + throw StandardException.newException + (SQLState.REPLICATION_UNEXPECTED_EXCEPTION, se); + }

        The method is declared as "throws StandardException", so there's no need to wrap se, I think. And exceptions are automatically logged in accordance with derby.stream.error.logSeverityLevel anyway, aren't they? If the user has requested no error logging, I'm not sure if we should ignore that request. (If the user is never going to see the exception, I think it's a good idea to log it, but in startMaster() the exception will propagate out to the user, if I understand correctly.)

        Show
        Knut Anders Hatlen added a comment - Do we need try/catch here? (in MasterController.startMaster) + try { + logFactory.startReplicationMasterRole(this); + } catch (StandardException se) { + Monitor.logMessage(REPLICATION_MESSAGE_HEADER + + "Exception occurred while trying to start replication" + + " Master role."); + printErrorStack(se); + throw StandardException.newException + (SQLState.REPLICATION_UNEXPECTED_EXCEPTION, se); + } The method is declared as "throws StandardException", so there's no need to wrap se, I think. And exceptions are automatically logged in accordance with derby.stream.error.logSeverityLevel anyway, aren't they? If the user has requested no error logging, I'm not sure if we should ignore that request. (If the user is never going to see the exception, I think it's a good idea to log it, but in startMaster() the exception will propagate out to the user, if I understand correctly.)
        V.Narayanan made changes -
        Attachment LogShipperIntegration_v2.diff [ 12368597 ]
        Attachment LogShipperIntegration_v2.stat [ 12368598 ]
        Hide
        V.Narayanan added a comment -

        Thank you for the comments and reviews Oystein.

        >1. AsynchronousLogShipper#shipALogChunk: If resending of
        > failedChunk fails, the current code will set failedChunk to null.
        > Is that intentional?

        I guess you are referring to failedChunk being assigned mesg. When
        failedChunk transmission fails then mesg would also be null.

        It was my mistake and was not intentional.

        I have overcome this by doing

        failedChunk = (mesg==null) ? failedChunk : mesg;

        mesg is null would imply that a transmission of the failedChunk was being
        attempted and failedChunk would not be lost now.

        >2. MasterController#setupConnection: Why are IOEXcecptions not
        > reported? Maybe the IOException could contain useful information
        > to figure out a network problem.

        I initially thought of reporting IOExceptions here, but IOExceptions
        cause setupConnection to return false and a next reconnect is attempted
        in half a second. Reporting this failure in log would cause a
        IOException to be printed in the log every half a second. Would'nt this
        amount to flooding the log? All these IOEXceptions being registered would
        contain the same information stating a socket connection exception. So I was
        afraid that it would contain the same information being repeated too many times.

        >3. MasterController#startMaster:

        > a) Do you depend on the thread to be interrupted before you give up
        > attempting to start replication? (Same issue with
        > handleException)

        Yes, the attempt to connect to the slave will persist until it is interrupted.
        One thing I can do is to ensure that the attempts are stopped when the user calls
        stopreplication. I have attempted to link stopMaster() with this. I did this because
        upon retrospection it did not seen right to me that a network connection is attempted
        while the user has called stopreplication.

        > b) InterruptedExceptions are normally trapped since they occur when
        > some code intentionally interrupts the thread.

        I was throwing the interrupted exception obtained from Thread.currentThread().sleep(500);.
        I have changed it to simply log the exception and throw it no more.

        >4. MasterController#stopMaster:

        > a) The setting of the stop state is not synchronized with the log
        > shipper thread. Hence, you have no guarantee for when the log
        > shipper thread will see that the stop field has changed. If you
        > make AsynchronousLogShipper#shipALogChunk volatile, it will be
        > guaranteed that updates to the field are visible to other
        > threads. (An alternative is to make stopLogShipment
        > synchronized, but that is probably more expensive).

        > b) Maybe you should interrupt the log shipper thread to make sure
        > it stops immediately?

        I have interrupted the log shipper thread and consequently removed the stopShipping
        check from the run() method. I instead have introduced a while(true) loop. I have however
        retained the stopShipping check inorder to ensure that a forceFlush call after stopreplication
        does not succeed.

        >5. Do you plan to internationalize the derby.log messages in a later
        > patch?

        Yes, I was hoping that this could be done in a follow up patch if this would be OK
        with you.

        >6. I suggest you make a method that log the error message, print the
        > stack and stop the master, so you do not have to repeat that many
        > times.

        Introduced printStackAndStopMaster that does the sequence. I thought I could
        move the printErrorStack code also in but doing that seemed to clutter a three
        step process being modularized. So decided to let it remain this way.

        >7. There is an unused import of SanityManager in MasterController.java

        I have removed this unused import.

        >8. MasterController#startMaster(handleExceptions: Thread.sleep is
        > static, so there is no need to call currentThread().

        changed it to Thread.sleep in handleExceptions also.

        Show
        V.Narayanan added a comment - Thank you for the comments and reviews Oystein. >1. AsynchronousLogShipper#shipALogChunk: If resending of > failedChunk fails, the current code will set failedChunk to null. > Is that intentional? I guess you are referring to failedChunk being assigned mesg. When failedChunk transmission fails then mesg would also be null. It was my mistake and was not intentional. I have overcome this by doing failedChunk = (mesg==null) ? failedChunk : mesg; mesg is null would imply that a transmission of the failedChunk was being attempted and failedChunk would not be lost now. >2. MasterController#setupConnection: Why are IOEXcecptions not > reported? Maybe the IOException could contain useful information > to figure out a network problem. I initially thought of reporting IOExceptions here, but IOExceptions cause setupConnection to return false and a next reconnect is attempted in half a second. Reporting this failure in log would cause a IOException to be printed in the log every half a second. Would'nt this amount to flooding the log? All these IOEXceptions being registered would contain the same information stating a socket connection exception. So I was afraid that it would contain the same information being repeated too many times. >3. MasterController#startMaster: > a) Do you depend on the thread to be interrupted before you give up > attempting to start replication? (Same issue with > handleException) Yes, the attempt to connect to the slave will persist until it is interrupted. One thing I can do is to ensure that the attempts are stopped when the user calls stopreplication. I have attempted to link stopMaster() with this. I did this because upon retrospection it did not seen right to me that a network connection is attempted while the user has called stopreplication. > b) InterruptedExceptions are normally trapped since they occur when > some code intentionally interrupts the thread. I was throwing the interrupted exception obtained from Thread.currentThread().sleep(500);. I have changed it to simply log the exception and throw it no more. >4. MasterController#stopMaster: > a) The setting of the stop state is not synchronized with the log > shipper thread. Hence, you have no guarantee for when the log > shipper thread will see that the stop field has changed. If you > make AsynchronousLogShipper#shipALogChunk volatile, it will be > guaranteed that updates to the field are visible to other > threads. (An alternative is to make stopLogShipment > synchronized, but that is probably more expensive). > b) Maybe you should interrupt the log shipper thread to make sure > it stops immediately? I have interrupted the log shipper thread and consequently removed the stopShipping check from the run() method. I instead have introduced a while(true) loop. I have however retained the stopShipping check inorder to ensure that a forceFlush call after stopreplication does not succeed. >5. Do you plan to internationalize the derby.log messages in a later > patch? Yes, I was hoping that this could be done in a follow up patch if this would be OK with you. >6. I suggest you make a method that log the error message, print the > stack and stop the master, so you do not have to repeat that many > times. Introduced printStackAndStopMaster that does the sequence. I thought I could move the printErrorStack code also in but doing that seemed to clutter a three step process being modularized. So decided to let it remain this way. >7. There is an unused import of SanityManager in MasterController.java I have removed this unused import. >8. MasterController#startMaster(handleExceptions: Thread.sleep is > static, so there is no need to call currentThread(). changed it to Thread.sleep in handleExceptions also.
        Hide
        Øystein Grøvlen added a comment -

        Some additional minor comments:

        7. There is an unused import of SanityManager in MasterController.java

        8. MasterController#startMaster(handleExceptions: Thread.sleep is
        static, so there is no need to call currentThread().

        Show
        Øystein Grøvlen added a comment - Some additional minor comments: 7. There is an unused import of SanityManager in MasterController.java 8. MasterController#startMaster(handleExceptions: Thread.sleep is static, so there is no need to call currentThread().
        Hide
        Øystein Grøvlen added a comment -

        Narayanan, thanks for the patch. Here are my comments and questions:

        1. AsynchronousLogShipper#shipALogChunk: If resending of
        failedChunk fails, the current code will set failedChunk to null.
        Is that intentional?

        2. MasterController#setupConnection: Why are IOEXcecptions not
        reported? Maybe the IOException could contain useful information
        to figure out a network problem.

        3. MasterController#startMaster:

        a) Do you depend on the thread to be interrupted before you give up
        attempting to start replication? (Same issue with
        handleException)

        b) InterruptedExceptions are normally trapped since they occur when
        some code intentionally interrupts the thread.

        4. MasterController#stopMaster:

        a) The setting of the stop state is not synchronized with the log
        shipper thread. Hence, you have no guarantee for when the log
        shipper thread will see that the stop field has changed. If you
        make AsynchronousLogShipper#shipALogChunk volatile, it will be
        guaranteed that updates to the field are visible to other
        threads. (An alternative is to make stopLogShipment
        synchronized, but that is probably more expensive).

        b) Maybe you should interrupt the log shipper thread to make sure
        it stops immediately?

        5. Do you plan to internationalize the derby.log messages in a later
        patch?

        6. I suggest you make a method that log the error message, print the
        stack and stop the master, so you do not have to repeat that many
        times.

        Show
        Øystein Grøvlen added a comment - Narayanan, thanks for the patch. Here are my comments and questions: 1. AsynchronousLogShipper#shipALogChunk: If resending of failedChunk fails, the current code will set failedChunk to null. Is that intentional? 2. MasterController#setupConnection: Why are IOEXcecptions not reported? Maybe the IOException could contain useful information to figure out a network problem. 3. MasterController#startMaster: a) Do you depend on the thread to be interrupted before you give up attempting to start replication? (Same issue with handleException) b) InterruptedExceptions are normally trapped since they occur when some code intentionally interrupts the thread. 4. MasterController#stopMaster: a) The setting of the stop state is not synchronized with the log shipper thread. Hence, you have no guarantee for when the log shipper thread will see that the stop field has changed. If you make AsynchronousLogShipper#shipALogChunk volatile, it will be guaranteed that updates to the field are visible to other threads. (An alternative is to make stopLogShipment synchronized, but that is probably more expensive). b) Maybe you should interrupt the log shipper thread to make sure it stops immediately? 5. Do you plan to internationalize the derby.log messages in a later patch? 6. I suggest you make a method that log the error message, print the stack and stop the master, so you do not have to repeat that many times.
        V.Narayanan made changes -
        Attachment LogShipperIntegration_v1.diff [ 12368013 ]
        Attachment LogShipperIntegration_v1.stat [ 12368014 ]
        Hide
        V.Narayanan added a comment -

        Pls find attached the patch that handles the integration of the log shipper and the master
        controller. I am running tests on the patch and shall report the results as soon as the tests
        complete.

        I request the patch to please be considered for reviews and comments.

        Show
        V.Narayanan added a comment - Pls find attached the patch that handles the integration of the log shipper and the master controller. I am running tests on the patch and shall report the results as soon as the tests complete. I request the patch to please be considered for reviews and comments.
        Hide
        V.Narayanan added a comment -

        While working on the integration of the log shipper into the MasterController(MC) I will be introducing code that will log MC startup,
        any failures that occur and MC shutdown in derby.log.

        while printing messages in derby.log a distinction ought to be made with respect to the messages that will be printed
        always and the messages that will be printed only during debug(sane) mode.

        With respect to replication and this patch the following messages will be printed irrespective of whether the debug mode is on or not

        1) Replication(MC) startup
        2) Error conditions that occur while MC (log shipping) is running
        3) Replication(MC) shutdown

        Show
        V.Narayanan added a comment - While working on the integration of the log shipper into the MasterController(MC) I will be introducing code that will log MC startup, any failures that occur and MC shutdown in derby.log. while printing messages in derby.log a distinction ought to be made with respect to the messages that will be printed always and the messages that will be printed only during debug(sane) mode. With respect to replication and this patch the following messages will be printed irrespective of whether the debug mode is on or not 1) Replication(MC) startup 2) Error conditions that occur while MC (log shipping) is running 3) Replication(MC) shutdown
        V.Narayanan made changes -
        Status Resolved [ 5 ] Reopened [ 4 ]
        Resolution Fixed [ 1 ]
        Hide
        V.Narayanan added a comment -

        I intend to use the same issue for log shipper integration with the master controller. hence I am re-opening the issue.

        Show
        V.Narayanan added a comment - I intend to use the same issue for log shipper integration with the master controller. hence I am re-opening the issue.
        Hide
        Knut Anders Hatlen added a comment -

        I don't quite understand this part of the patch:

        + } catch (NoSuchElementException nse)

        { + //Although next() returns true a request for data on the buffer + //fails implying that there has been a fatal exception in the + //buffer. + masterController.handleExceptions(StandardException.newException + (SQLState.REPLICATION_UNEXPECTED_EXCEPTION, nse)); + }

        Is there something special about NoSuchElementException compared to other runtime exceptions like NullPointerException?

        If the problem is that run() swallows runtime exceptions, wouldn't it be better to add a generic "catch (RuntimeException re)" clause in the run() method and handle the exception there? That way you would solve the problem both for NoSuchElementException and for other unexpected exceptions.

        Show
        Knut Anders Hatlen added a comment - I don't quite understand this part of the patch: + } catch (NoSuchElementException nse) { + //Although next() returns true a request for data on the buffer + //fails implying that there has been a fatal exception in the + //buffer. + masterController.handleExceptions(StandardException.newException + (SQLState.REPLICATION_UNEXPECTED_EXCEPTION, nse)); + } Is there something special about NoSuchElementException compared to other runtime exceptions like NullPointerException? If the problem is that run() swallows runtime exceptions, wouldn't it be better to add a generic "catch (RuntimeException re)" clause in the run() method and handle the exception there? That way you would solve the problem both for NoSuchElementException and for other unexpected exceptions.
        Øystein Grøvlen made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Øystein Grøvlen added a comment -

        Patch LogShipperImpl_v5.diff committed at revision 581534

        Show
        Øystein Grøvlen added a comment - Patch LogShipperImpl_v5.diff committed at revision 581534
        V.Narayanan made changes -
        Attachment LogShipperImpl_v5.diff [ 12366912 ]
        Attachment LogShipperImpl_v5.stat [ 12366913 ]
        Hide
        V.Narayanan added a comment -

        Thanks once again for the comments and reviews Oystein.

        Pls find patch attached that fixes all the issues pointed out.

        Show
        V.Narayanan added a comment - Thanks once again for the comments and reviews Oystein. Pls find patch attached that fixes all the issues pointed out.
        Hide
        Øystein Grøvlen added a comment -

        All looks very good, except that it seems I did not explain what I
        meant in comment 14 well enough. I suggest shipALogChunk and
        forceFlush does NOT call handleExceptions, nor catch exceptions they
        are not going to deal with. In other words, I think all exceptions
        except the NoSuchElementExcpetion should not be caught, but left to
        the caller (the log shipper thread or the master controller to deal
        with).

        In addition, I have a few VERY minor comments:

        • Import of PriviligedActionException is no longer needed
        • mesg does no longer need to be declared outside the loop
        • It is probably good to explain why NoSuchElementExcpetion is fatal.
        • Why create a new SQLState? Could not XRE03 be used?
        Show
        Øystein Grøvlen added a comment - All looks very good, except that it seems I did not explain what I meant in comment 14 well enough. I suggest shipALogChunk and forceFlush does NOT call handleExceptions, nor catch exceptions they are not going to deal with. In other words, I think all exceptions except the NoSuchElementExcpetion should not be caught, but left to the caller (the log shipper thread or the master controller to deal with). In addition, I have a few VERY minor comments: Import of PriviligedActionException is no longer needed mesg does no longer need to be declared outside the loop It is probably good to explain why NoSuchElementExcpetion is fatal. Why create a new SQLState? Could not XRE03 be used?
        V.Narayanan made changes -
        Attachment LogShipperImpl_v4.diff [ 12366886 ]
        Attachment LogShipperImpl_v4.stat [ 12366887 ]
        Hide
        V.Narayanan added a comment -

        Thank you for the comments on the patch Oystein. I have addressed all the
        issues pointed out. Some explanation for the changes made follows.

        > AsynchronousLogShipper, unused import: SQLState

        In order to address your comment related to NoSuchElementException I am
        throwing a StandardException with SQLState as SQLState.REPLICATION_FATAL_ERROR.
        I therefore have retained this import.

        >"when the shipper is informed to perform its periodic shipping
        >also." I feel this sentence is a bit unclear. What does is mean
        >to be "informed to perform its periodic shipping"?

        I meant when a forceFlush wakes up the periodic shipping thread. I have however
        rephrased it to say

        " The log shipping happens at the configured shipping intervals unless a
        force flush happens, which triggers periodic shipping also since there
        will still be more log to send after the forceFlush has sent one chunk. "

        >AsynchronousLogShipper#shipALogChunk/forceFlush: I am not sure
        >that it is a good idea to call handleException when you are not in
        >the log shipping thread. In that case, one might as well just let
        >the exception go up to the MasterController directly. I would
        >also think it is the master controller that should decide whether
        >or not to reconnect, not the log shipper on its own. With respect
        >to NoSuchElementException, I think you have better knowledge of
        >the context in which it is happening here than the
        >MasterController will have, and it might be a good idea that the
        >log shipper decides how to handle it. Since it will occur if you
        >do not get any data even if next() return true, something fatal
        >must have happened, and I guess you might as well report back a
        >fatal replication error.

        I have changed the shipALogChunk() method to throw its exceptions so that
        the run() and the forceFlush() may handle them apropos.

        I now catch the NoSuchElementException and wrap it in a StandardException
        with a SQLState as SQLState.REPLICATION_FATAL_ERROR.

        The IOException caused by a connection failure is also thrown, no reconnect
        is attempted.

        forceFlush() throws all its exceptions to master controller. I am wrapping the
        IOException obtained while calling shipALogChunk in a StandardException since
        its signature in LogShipper mandates that a StandardException is what is throw.

        run() calls handleExceptions.

        Show
        V.Narayanan added a comment - Thank you for the comments on the patch Oystein. I have addressed all the issues pointed out. Some explanation for the changes made follows. > AsynchronousLogShipper, unused import: SQLState In order to address your comment related to NoSuchElementException I am throwing a StandardException with SQLState as SQLState.REPLICATION_FATAL_ERROR. I therefore have retained this import. >"when the shipper is informed to perform its periodic shipping >also." I feel this sentence is a bit unclear. What does is mean >to be "informed to perform its periodic shipping"? I meant when a forceFlush wakes up the periodic shipping thread. I have however rephrased it to say " The log shipping happens at the configured shipping intervals unless a force flush happens, which triggers periodic shipping also since there will still be more log to send after the forceFlush has sent one chunk. " >AsynchronousLogShipper#shipALogChunk/forceFlush: I am not sure >that it is a good idea to call handleException when you are not in >the log shipping thread. In that case, one might as well just let >the exception go up to the MasterController directly. I would >also think it is the master controller that should decide whether >or not to reconnect, not the log shipper on its own. With respect >to NoSuchElementException, I think you have better knowledge of >the context in which it is happening here than the >MasterController will have, and it might be a good idea that the >log shipper decides how to handle it. Since it will occur if you >do not get any data even if next() return true, something fatal >must have happened, and I guess you might as well report back a >fatal replication error. I have changed the shipALogChunk() method to throw its exceptions so that the run() and the forceFlush() may handle them apropos. I now catch the NoSuchElementException and wrap it in a StandardException with a SQLState as SQLState.REPLICATION_FATAL_ERROR. The IOException caused by a connection failure is also thrown, no reconnect is attempted. forceFlush() throws all its exceptions to master controller. I am wrapping the IOException obtained while calling shipALogChunk in a StandardException since its signature in LogShipper mandates that a StandardException is what is throw. run() calls handleExceptions.
        Hide
        Øystein Grøvlen added a comment -

        Thanks for addressing my comments. I have a few follw-up issues and
        some issues that I did not think of the previous time:

        9. I am not sure it is right to put the LogShipper in iapi. As far
        as I can tell the MasterFactory will be the mediator for the rest
        of the system with respect to replication. Hence, I do not think
        it is necessary for LogShipper to be visible outside the master
        package.

        10. Along the same lines, maybe the AsynchronousLogShipper could
        relate directly to the MasterController, instead of MasterFactory?
        I am not quite sure about this, but it seems to me that
        handleException might as well be a package private method in
        MasterController.

        11. AsynchronousLogShipper, unused import: SQLState

        12. AsynchronousLogShipper, constructor: You have removed the
        exception from the javadoc, but not the throws list.

        13. AsynchronousLogShipper#run:
        a) "when the shipper is informed to perform its periodic shipping
        also." I feel this sentence is a bit unclear. What does is mean
        to be "informed to perform its periodic shipping"?

        b) I think the interrupted exception should be caught and ignored.
        This will make it possible for someone to immediately stop the
        shipping thread in a controlled way by first setting the stop
        flag and then interrupting the thread.

        14. AsynchronousLogShipper#shipALogChunk/forceFlush: I am not sure
        that it is a good idea to call handleException when you are not in
        the log shipping thread. In that case, one might as well just let
        the exception go up to the MasterController directly. I would
        also think it is the master controller that should decide whether
        or not to reconnect, not the log shipper on its own. With respect
        to NoSuchElementException, I think you have better knowledge of
        the context in which it is happening here than the
        MasterController will have, and it might be a good idea that the
        log shipper decides how to handle it. Since it will occur if you
        do not get any data even if next() return true, something fatal
        must have happened, and I guess you might as well report back a
        fatal replication error.

        15. AsynchronousLogShipper#forceFlush: Why use notifyAll() when there
        should be only one thread waiting? I would use notify() instead.

        16. AsynchronousLogShipper#stopLogShippment: Typo: there should be
        only one 'p' in shipment.

        Show
        Øystein Grøvlen added a comment - Thanks for addressing my comments. I have a few follw-up issues and some issues that I did not think of the previous time: 9. I am not sure it is right to put the LogShipper in iapi. As far as I can tell the MasterFactory will be the mediator for the rest of the system with respect to replication. Hence, I do not think it is necessary for LogShipper to be visible outside the master package. 10. Along the same lines, maybe the AsynchronousLogShipper could relate directly to the MasterController, instead of MasterFactory? I am not quite sure about this, but it seems to me that handleException might as well be a package private method in MasterController. 11. AsynchronousLogShipper, unused import: SQLState 12. AsynchronousLogShipper, constructor: You have removed the exception from the javadoc, but not the throws list. 13. AsynchronousLogShipper#run: a) "when the shipper is informed to perform its periodic shipping also." I feel this sentence is a bit unclear. What does is mean to be "informed to perform its periodic shipping"? b) I think the interrupted exception should be caught and ignored. This will make it possible for someone to immediately stop the shipping thread in a controlled way by first setting the stop flag and then interrupting the thread. 14. AsynchronousLogShipper#shipALogChunk/forceFlush: I am not sure that it is a good idea to call handleException when you are not in the log shipping thread. In that case, one might as well just let the exception go up to the MasterController directly. I would also think it is the master controller that should decide whether or not to reconnect, not the log shipper on its own. With respect to NoSuchElementException, I think you have better knowledge of the context in which it is happening here than the MasterController will have, and it might be a good idea that the log shipper decides how to handle it. Since it will occur if you do not get any data even if next() return true, something fatal must have happened, and I guess you might as well report back a fatal replication error. 15. AsynchronousLogShipper#forceFlush: Why use notifyAll() when there should be only one thread waiting? I would use notify() instead. 16. AsynchronousLogShipper#stopLogShippment: Typo: there should be only one 'p' in shipment.
        V.Narayanan made changes -
        Attachment LogShipperImpl_v3.stat [ 12366829 ]
        Attachment LogShipperImpl_v3.diff [ 12366828 ]
        Hide
        V.Narayanan added a comment -

        Thank you for the earlier comments Oystein. I have addressed all the
        issues pointed out in this patch.

        I have used the handledExceptions method and have followed Jorgen's
        suggestion in making AsynchronousLogShipper handle the exceptions
        to the master controller without wrapping them.

        I have however made handleExceptions carry no implementation in
        MasterController for now. I think handleException should probably
        raise a event that says that an exception had occurred during
        log shipping and needs to be handled, so that the master controller
        can break away from whatever it is doing currently and handle the
        failure.

        I however thought this could be postponed to the patch that integrates
        the log shipper into the master controller.

        Show
        V.Narayanan added a comment - Thank you for the earlier comments Oystein. I have addressed all the issues pointed out in this patch. I have used the handledExceptions method and have followed Jorgen's suggestion in making AsynchronousLogShipper handle the exceptions to the master controller without wrapping them. I have however made handleExceptions carry no implementation in MasterController for now. I think handleException should probably raise a event that says that an exception had occurred during log shipping and needs to be handled, so that the master controller can break away from whatever it is doing currently and handle the failure. I however thought this could be postponed to the patch that integrates the log shipper into the master controller.
        Hide
        Jørgen Løland added a comment -

        If I understand you correctly, you are asking two questions:

        Q1. Should LogShipper throw all exceptions to MasterController even in cases where it can do something meaningful with it?

        Q2. Should the LogShipper wrap exceptions in StandardExceptions before throwing them to MasterController.

        A1. As a general rule of thumb when it comes to exceptions, I think you need to decide whether or not your code can do anything about it or if it needs to be handled by code at a higher level. If your code can handle it, you may fix the problem and not throw the exception further (in this case from LogShipper to MasterController).

        A2. I don't think you should wrap an exception in a StandardException at any lower level than MasterController; let the controller decide which exceptions are fatal and which can be handled. That is easier to do with the original exception at hand than with a StandardException.

        Of course, nothing but StandardException should be seen outside the replication package, so MasterController has to wrap before throwing.

        As I see it, your alternatives are not mutually exclusive. I think you can actually do a little bit of alternative 1) and a little bit of alternative 2) :-D

        Show
        Jørgen Løland added a comment - If I understand you correctly, you are asking two questions: Q1. Should LogShipper throw all exceptions to MasterController even in cases where it can do something meaningful with it? Q2. Should the LogShipper wrap exceptions in StandardExceptions before throwing them to MasterController. A1. As a general rule of thumb when it comes to exceptions, I think you need to decide whether or not your code can do anything about it or if it needs to be handled by code at a higher level. If your code can handle it, you may fix the problem and not throw the exception further (in this case from LogShipper to MasterController). A2. I don't think you should wrap an exception in a StandardException at any lower level than MasterController; let the controller decide which exceptions are fatal and which can be handled. That is easier to do with the original exception at hand than with a StandardException. Of course, nothing but StandardException should be seen outside the replication package, so MasterController has to wrap before throwing. As I see it, your alternatives are not mutually exclusive. I think you can actually do a little bit of alternative 1) and a little bit of alternative 2) :-D
        Hide
        V.Narayanan added a comment -

        While working on creating the handleExceptions method in
        MasterController, I understood that there are two ways of
        doing this.

        1) Make LogShipper throw a StandardException in which case we
        can add more specific information in the exception, for e.g.

        When a NoSuchElementException in the run() method of the
        LogShipper class we could create a StandardException that
        contains a text that says

        "Error occurred while trying to retrieve log chunks from the
        log buffer"

        2) If we however throw it as the NoSuchElementException, in the
        handleExceptions method all we would say is that

        "Error occurred during log shipping"

        The master controller I realize does not need to do much in this
        case except for wrap it in the StandardException and throw it.

        2) Would be useful if we want the Master controller to do a exception
        specific action. This case would arise when we get an IOException
        during transmission of the log record.Since the MasterController is what
        creates the ReplicationNetworkMessageTransmit instance it should be
        responsible for reconnect. In certain other cases all the master controller
        would do is to throw the exception back.

        We can also argue in the favour of 1) saying that to do a transmitter.initConnection()
        why would you go back to the MasterController. We are after all not re-initializing
        the connection. We are attempting a re-connect in the already initialized
        network framework. If the reconnection fails we can throw this exception then.

        I am tending towards 1) since I do not feel it would be right for the LogShipper to go
        to the MasterController for all the exceptions it gets, and atleast not for attempting a
        reconnect in the network framework.

        Show
        V.Narayanan added a comment - While working on creating the handleExceptions method in MasterController, I understood that there are two ways of doing this. 1) Make LogShipper throw a StandardException in which case we can add more specific information in the exception, for e.g. When a NoSuchElementException in the run() method of the LogShipper class we could create a StandardException that contains a text that says "Error occurred while trying to retrieve log chunks from the log buffer" 2) If we however throw it as the NoSuchElementException, in the handleExceptions method all we would say is that "Error occurred during log shipping" The master controller I realize does not need to do much in this case except for wrap it in the StandardException and throw it. 2) Would be useful if we want the Master controller to do a exception specific action. This case would arise when we get an IOException during transmission of the log record.Since the MasterController is what creates the ReplicationNetworkMessageTransmit instance it should be responsible for reconnect. In certain other cases all the master controller would do is to throw the exception back. We can also argue in the favour of 1) saying that to do a transmitter.initConnection() why would you go back to the MasterController. We are after all not re-initializing the connection. We are attempting a re-connect in the already initialized network framework. If the reconnection fails we can throw this exception then. I am tending towards 1) since I do not feel it would be right for the LogShipper to go to the MasterController for all the exceptions it gets, and atleast not for attempting a reconnect in the network framework.
        Hide
        V.Narayanan added a comment -

        I will keep to one method to handle the exceptions for now. I will write the method assuming that only log shipper will be using it. I will extend it later.

        Show
        V.Narayanan added a comment - I will keep to one method to handle the exceptions for now. I will write the method assuming that only log shipper will be using it. I will extend it later.
        Hide
        Øystein Grøvlen added a comment -

        Your plan for handleExceptions sounds good. You need to decide whether this is a general method that may be called from any of MasterController's components or this is only to be used by LogShipper. If the former, handleException needs to be told which component has failed. If the latter, you should probably use a more specific name.

        If you want to adjust the shipping interval, you can adjust the parameter to wait() directly. I am not sure you need lastShippingTime for that.

        Show
        Øystein Grøvlen added a comment - Your plan for handleExceptions sounds good. You need to decide whether this is a general method that may be called from any of MasterController's components or this is only to be used by LogShipper. If the former, handleException needs to be told which component has failed. If the latter, you should probably use a more specific name. If you want to adjust the shipping interval, you can adjust the parameter to wait() directly. I am not sure you need lastShippingTime for that.
        Hide
        V.Narayanan added a comment -

        Thank you for the comments Oystein

        >Maybe there is a need for the MasterController to provide a method to
        >be called by the log shipping thread to inform it when errors occurs.

        This is a very good suggestion

        I plan to do the following for this.

        1) Introduce a method in MasterFactory called handleExceptions(Exception e)

        2) The implementation of handleExceptions in the MasterController will then
        run instanceof checks the exceptions obtained e.

        3) AsynchronousLogShipper constructor will be modified to accept the MasterFactory
        implementation(Master Controller).

        4) I will catch all exceptions in the LogShipper and call handleExceptions for
        all of them

        5) For now in the handleExceptions class I propose to throw all the exceptions
        obtained in StandardExceptions with appropriate SQLStates. More specific
        handling of the exceptions thrown can be handled later

        >1. I do not see the need for recording lastShippingTime. When a
        > forceFlush is made, I do not think you would want to delay the next
        > regular sending of log records since there will still be more log
        > to send after the forceFlush has sent one chunk. Rather, I think
        > you would want to notify the log shipping thread that it is time
        > for another send. Hence, I suggest that you drop testing for time,
        > and use wait() instead of sleep() so that it is possible to wake up
        > the thread while it is waiting.

        lastShippingTime can be very useful when you want to improve the LogShipper
        to automatically adjust shipping interval.

        lastShippingTime - current time will give the time interval at which the
        forceFlush keeps getting called. This difference can then act as the next
        shipping interval.

        But this is an improvement and can wait until later.

        I will remove it and use wait(long interval) as you say.

        Thanks once again for your comments.

        Show
        V.Narayanan added a comment - Thank you for the comments Oystein >Maybe there is a need for the MasterController to provide a method to >be called by the log shipping thread to inform it when errors occurs. This is a very good suggestion I plan to do the following for this. 1) Introduce a method in MasterFactory called handleExceptions(Exception e) 2) The implementation of handleExceptions in the MasterController will then run instanceof checks the exceptions obtained e. 3) AsynchronousLogShipper constructor will be modified to accept the MasterFactory implementation(Master Controller). 4) I will catch all exceptions in the LogShipper and call handleExceptions for all of them 5) For now in the handleExceptions class I propose to throw all the exceptions obtained in StandardExceptions with appropriate SQLStates. More specific handling of the exceptions thrown can be handled later >1. I do not see the need for recording lastShippingTime. When a > forceFlush is made, I do not think you would want to delay the next > regular sending of log records since there will still be more log > to send after the forceFlush has sent one chunk. Rather, I think > you would want to notify the log shipping thread that it is time > for another send. Hence, I suggest that you drop testing for time, > and use wait() instead of sleep() so that it is possible to wake up > the thread while it is waiting. lastShippingTime can be very useful when you want to improve the LogShipper to automatically adjust shipping interval. lastShippingTime - current time will give the time interval at which the forceFlush keeps getting called. This difference can then act as the next shipping interval. But this is an improvement and can wait until later. I will remove it and use wait(long interval) as you say. Thanks once again for your comments.
        Hide
        Øystein Grøvlen added a comment -

        V.Narayanan (JIRA) wrote:
        >> 4. Why are exceptions from shipALogChunk handled differently by run()
        >> and forceFlush()? Maybe you could let shipALogChunk handle the
        >> exceptions instead so you do not have to do it in two places?
        >
        > * It is not possible to throw exceptions from the run method, hence
        > I throw it as RuntimeExceptions.

        AFAIK, throwing an unchecked exception from the run method, will just
        cause the thread to stop, and the exception will not be seen by other
        threads. Hence, a thread should try to deal with its own exceptions,
        and only throw unchecked exceptions if there is no way to recover from
        the failure.

        ...

        > * I thought about handling the exceptions in shipALogChunk itself,
        > but we need some way of informing the MasterController that the
        > shipping has failed. We need the MasterController to be informed
        > of the exception.
        >
        > Handling the exceptions inside shipALogChunk would muffle the
        > exceptions inside this method.
        >
        > The above three considerations were what forced me to go in for the
        > heterogeneity in handling the exceptions between the two methods.

        I understand. Makes sense.

        Maybe there is a need for the MasterController to provide a method to
        be called by the log shipping thread to inform it when errors occurs.

        Show
        Øystein Grøvlen added a comment - V.Narayanan (JIRA) wrote: >> 4. Why are exceptions from shipALogChunk handled differently by run() >> and forceFlush()? Maybe you could let shipALogChunk handle the >> exceptions instead so you do not have to do it in two places? > > * It is not possible to throw exceptions from the run method, hence > I throw it as RuntimeExceptions. AFAIK, throwing an unchecked exception from the run method, will just cause the thread to stop, and the exception will not be seen by other threads. Hence, a thread should try to deal with its own exceptions, and only throw unchecked exceptions if there is no way to recover from the failure. ... > * I thought about handling the exceptions in shipALogChunk itself, > but we need some way of informing the MasterController that the > shipping has failed. We need the MasterController to be informed > of the exception. > > Handling the exceptions inside shipALogChunk would muffle the > exceptions inside this method. > > The above three considerations were what forced me to go in for the > heterogeneity in handling the exceptions between the two methods. I understand. Makes sense. Maybe there is a need for the MasterController to provide a method to be called by the log shipping thread to inform it when errors occurs.
        Hide
        V.Narayanan added a comment -

        Thank you for the comments and the reviews oystein.

        >4. Why are exceptions from shipALogChunk handled differently by run()
        > and forceFlush()? Maybe you could let shipALogChunk handle the
        > exceptions instead so you do not have to do it in two places?

        • It is not possible to throw exceptions from the run method, hence I throw it as
          RuntimeExceptions.
        • forceFlush is a method that might have a different implementation if in future
          we decide we are going to implement a more sophisticated LogShipper, but the
          basic interface should remain common. So the exception that is thrown should
          be the same irrespective of the implementation.

        So the best way of making the exceptions thrown in derby uniform is using the
        Derby StandardException

        • I thought about handling the exceptions in shipALogChunk itself, but we need
          some way of informing the MasterController that the shipping has failed. We need
          the MasterController to be informed of the exception.

        Handling the exceptions inside shipALogChunk would muffle the exceptions inside
        this method.

        The above three considerations were what forced me to go in for the heterogeneity
        in handling the exceptions between the two methods.

        Show
        V.Narayanan added a comment - Thank you for the comments and the reviews oystein. >4. Why are exceptions from shipALogChunk handled differently by run() > and forceFlush()? Maybe you could let shipALogChunk handle the > exceptions instead so you do not have to do it in two places? It is not possible to throw exceptions from the run method, hence I throw it as RuntimeExceptions. forceFlush is a method that might have a different implementation if in future we decide we are going to implement a more sophisticated LogShipper, but the basic interface should remain common. So the exception that is thrown should be the same irrespective of the implementation. So the best way of making the exceptions thrown in derby uniform is using the Derby StandardException I thought about handling the exceptions in shipALogChunk itself, but we need some way of informing the MasterController that the shipping has failed. We need the MasterController to be informed of the exception. Handling the exceptions inside shipALogChunk would muffle the exceptions inside this method. The above three considerations were what forced me to go in for the heterogeneity in handling the exceptions between the two methods.
        Hide
        Øystein Grøvlen added a comment -

        Thanks for the patch, Narayanan. It looks good and is very well
        documented. Except for my first three issues, my comments are minor:

        1. I do not see the need for recording lastShippingTime. When a
        forceFlush is made, I do not think you would want to delay the next
        regular sending of log records since there will still be more log
        to send after the forceFlush has sent one chunk. Rather, I think
        you would want to notify the log shipping thread that it is time
        for another send. Hence, I suggest that you drop testing for time,
        and use wait() instead of sleep() so that it is possible to wake up
        the thread while it is waiting.

        2. It seems like shipALogChunk will send a message even if there is no
        log records to send. Should not the sending also be part of the
        body of the if statement?

        3. I would think you need some way for the master controller to stop
        log shipping (e.g., when replication is stopped). In other words,
        the LogShipper interface need a stop method, and the loop needs to
        test for whether it should stop.

        4. Why are exceptions from shipALogChunk handled differently by run()
        and forceFlush()? Maybe you could let shipALogChunk handle the
        exceptions instead so you do not have to do it in two places?

        5. I suggest dropping 'Replication' from the name of
        'ReplicationAsynchronousLogShipper'. Would make it a bit shorter,
        and I think it is evident that we are talking about replication
        here.

        6. The constructor is declared to throw a StandardException and the
        javadoc says it may happen when you 'register to the shipping
        daemon". Is this preparing for something that will be added later?

        7. In both files there is a typo in the javadoc for flushedInstance
        ('latestInstanceFlishedToDisk').

        8. For ReplicationAsynchronousLogShipper#flushedInstance, I think you
        should include in the javadoc that calling it will have no effect.

        Show
        Øystein Grøvlen added a comment - Thanks for the patch, Narayanan. It looks good and is very well documented. Except for my first three issues, my comments are minor: 1. I do not see the need for recording lastShippingTime. When a forceFlush is made, I do not think you would want to delay the next regular sending of log records since there will still be more log to send after the forceFlush has sent one chunk. Rather, I think you would want to notify the log shipping thread that it is time for another send. Hence, I suggest that you drop testing for time, and use wait() instead of sleep() so that it is possible to wake up the thread while it is waiting. 2. It seems like shipALogChunk will send a message even if there is no log records to send. Should not the sending also be part of the body of the if statement? 3. I would think you need some way for the master controller to stop log shipping (e.g., when replication is stopped). In other words, the LogShipper interface need a stop method, and the loop needs to test for whether it should stop. 4. Why are exceptions from shipALogChunk handled differently by run() and forceFlush()? Maybe you could let shipALogChunk handle the exceptions instead so you do not have to do it in two places? 5. I suggest dropping 'Replication' from the name of 'ReplicationAsynchronousLogShipper'. Would make it a bit shorter, and I think it is evident that we are talking about replication here. 6. The constructor is declared to throw a StandardException and the javadoc says it may happen when you 'register to the shipping daemon". Is this preparing for something that will be added later? 7. In both files there is a typo in the javadoc for flushedInstance ('latestInstanceFlishedToDisk'). 8. For ReplicationAsynchronousLogShipper#flushedInstance, I think you should include in the javadoc that calling it will have no effect.
        Hide
        V.Narayanan added a comment -

        > Jorgen says
        > In the general case, we should use DaemonService whenever possible to keep uniformity with >existing Derby code. As far as I have seen, no Derby class implements Runnable...

        well, actually it is used in DRDAConnThread

        Show
        V.Narayanan added a comment - > Jorgen says > In the general case, we should use DaemonService whenever possible to keep uniformity with >existing Derby code. As far as I have seen, no Derby class implements Runnable... well, actually it is used in DRDAConnThread
        V.Narayanan made changes -
        Attachment LogShipperImpl_v2.diff [ 12366576 ]
        Attachment LogShipperImpl_v2.stat [ 12366577 ]
        Hide
        V.Narayanan added a comment -

        Thank you for the comments and guidance Oystein and Jorgen.

        Pls find a patch attached containing the preliminary implementation of the
        log shipper.

        The patch keeps to the design posted earlier, however, the code that
        invokes the log shipper from the MasterController is not present since
        I thought that should be done as a seperate integration issue or atleast
        as a follow-up patch in this issue.

        Show
        V.Narayanan added a comment - Thank you for the comments and guidance Oystein and Jorgen. Pls find a patch attached containing the preliminary implementation of the log shipper. The patch keeps to the design posted earlier, however, the code that invokes the log shipper from the MasterController is not present since I thought that should be done as a seperate integration issue or atleast as a follow-up patch in this issue.
        Hide
        Øystein Grøvlen added a comment -

        V.Narayanan (JIRA) wrote:
        > I however do not want to do this in the first version of the patch
        > for this issue. I will submit the first version assuming a static
        > time interval, keep this issue open and change it in the next
        > version if this is OK with the community.

        Sounds good to me. A static, but configurable, timeout interval
        should probably be good-enough for most cases.

        > The DaemonFactory would call performWork (i.e.) ship log records at its own
        > convinience. As i understand it we cannot configure DaemonFactory to transmit
        > at our mentioned time interval.
        >
        > So, When performWork is called entirely depends on how good
        > DaemonFactory is feeling
        >
        > We do not want this to happen, because what if this is too soon for
        > us or what if it is too late.
        >
        > Hence we introduce a transmit interval variable that calculates the
        > difference between the last call and the current call of performWork
        > or rather shipALogChunk(forceFlush also)

        My concern is that if the Daemon Service have very little else to do,
        it will continously call LogShipper#performWork and use a lot of
        unecessary CPU for checking the time.

        > Sorry about being ambiguous here. ShippingDaemon is Derby's
        > DaemonService that is booted up and is not a thread I introduce.
        >

        Using a separate thread would probably give you more flexibility with
        respect to how log shipping is scheduled.

        Show
        Øystein Grøvlen added a comment - V.Narayanan (JIRA) wrote: > I however do not want to do this in the first version of the patch > for this issue. I will submit the first version assuming a static > time interval, keep this issue open and change it in the next > version if this is OK with the community. Sounds good to me. A static, but configurable, timeout interval should probably be good-enough for most cases. > The DaemonFactory would call performWork (i.e.) ship log records at its own > convinience. As i understand it we cannot configure DaemonFactory to transmit > at our mentioned time interval. > > So, When performWork is called entirely depends on how good > DaemonFactory is feeling > > We do not want this to happen, because what if this is too soon for > us or what if it is too late. > > Hence we introduce a transmit interval variable that calculates the > difference between the last call and the current call of performWork > or rather shipALogChunk(forceFlush also) My concern is that if the Daemon Service have very little else to do, it will continously call LogShipper#performWork and use a lot of unecessary CPU for checking the time. > Sorry about being ambiguous here. ShippingDaemon is Derby's > DaemonService that is booted up and is not a thread I introduce. > Using a separate thread would probably give you more flexibility with respect to how log shipping is scheduled.
        Hide
        Jørgen Løland added a comment -

        > Does this mean atleast in the case of log shipper we would be forced to use threading since the DaemonService interval cannot be varied?

        In the general case, we should use DaemonService whenever possible to keep uniformity with existing Derby code. As far as I have seen, no Derby class implements Runnable...

        In this case, however, a separate "homemade" thread may be better since you'll have to use an eternal loop. The API of DaemonService/Factory says that the performWork method should not take much time. Eternal loops hardly fit that description

        Unless someone has a strong argument for not creating a thread (as opposed to using Derby's DaemonService), I think you should go for a normal Java thread in this particular case.

        Show
        Jørgen Løland added a comment - > Does this mean atleast in the case of log shipper we would be forced to use threading since the DaemonService interval cannot be varied? In the general case, we should use DaemonService whenever possible to keep uniformity with existing Derby code. As far as I have seen, no Derby class implements Runnable... In this case, however, a separate "homemade" thread may be better since you'll have to use an eternal loop. The API of DaemonService/Factory says that the performWork method should not take much time. Eternal loops hardly fit that description Unless someone has a strong argument for not creating a thread (as opposed to using Derby's DaemonService), I think you should go for a normal Java thread in this particular case.
        Hide
        V.Narayanan added a comment -

        >If I understand you correctly, this time interval will only be helpful in the "too soon" case, not the >"too late" case.

        >When I wrote version 2 of the proof of concept patches (see DERBY-2872), I used a Daemon to >read log records at the slave side. I basically had:

        >performWork

        { > if (messageWaitingForProcess) processMessage(); >}

        >The waiting interval between each time performWork was called was way too high for this usage >(in the order of 3-10 seconds). This interval is probably too high for the log shipper as well, so >my guess is that you'll need something similar to an eternal loop with a wait(millis) somewhere in >this package.

        Does this mean atleast in the case of log shipper we would be forced to use threading since the DaemonService interval cannot be varied?

        Guess I have a LogShipper specific answer for DaemonService vs Threads if I am right.

        Show
        V.Narayanan added a comment - >If I understand you correctly, this time interval will only be helpful in the "too soon" case, not the >"too late" case. >When I wrote version 2 of the proof of concept patches (see DERBY-2872 ), I used a Daemon to >read log records at the slave side. I basically had: >performWork { > if (messageWaitingForProcess) processMessage(); >} >The waiting interval between each time performWork was called was way too high for this usage >(in the order of 3-10 seconds). This interval is probably too high for the log shipper as well, so >my guess is that you'll need something similar to an eternal loop with a wait(millis) somewhere in >this package. Does this mean atleast in the case of log shipper we would be forced to use threading since the DaemonService interval cannot be varied? Guess I have a LogShipper specific answer for DaemonService vs Threads if I am right.
        Hide
        V.Narayanan added a comment -

        "I have introduced the time concept to overcome the limitation imposed by
        DaemonFactory. "

        Should have been DaemonService. We create the Daemon from the DaemonFactory.

        Show
        V.Narayanan added a comment - "I have introduced the time concept to overcome the limitation imposed by DaemonFactory. " Should have been DaemonService. We create the Daemon from the DaemonFactory.
        Hide
        V.Narayanan added a comment -

        >1. If I understand you correctly, log records will be shipped either
        > when the log buffer is full or when a timeout occur. If the traffic
        > is high so that the time it takes to fill the log buffer is
        > normally less than the timeout, this means that log will only be
        > sent when the log buffer is full. Would it not be better to try to
        > keep the log shipping ahead so that it never goes full?

        What we can do is we start with a default value for transmit interval.
        For theory sake let us assume it is 3 seconds.

        Suppose the condition stated on your comments arise, we maintain a counter
        that keeps track of the number of times force flush is called, if this counter
        value exceeds a certain value we can replace the transmit interval with
        the last transmit interval we obtained.

        Thus this would automatically take care of the case of heavy load.

        A similar check should run in performWork to take care of high load.

        Both these counters should be reset when the other method is called.

        The counter for forceFlush is reset when performWork is called and vice-versa.

        I however do not want to do this in the first version of the patch for this issue.
        I will submit the first version assuming a static time interval, keep this issue
        open and change it in the next version if this is OK with the community.

        I feel this will put in the essential features first and the improvements later.

        >2. It seems like you suggest that the shipping thread will continously
        > check to see whether the timeout has expired. Generally, such
        > busy-waits should be avoided. I think you should introduce a timer
        > or something so that the thread can be suspended until it is time
        > to ship log records.

        I have introduced the time concept to overcome the limitation imposed by
        DaemonFactory.

        The DaemonFactory would call performWork (i.e.) ship log records at its own
        convinience. As i understand it we cannot configure DaemonFactory to transmit
        at our mentioned time interval.

        So, When performWork is called entirely depends on how good DaemonFactory is feeling

        We do not want this to happen, because what if this is too soon for us or what if it
        is too late.

        Hence we introduce a transmit interval variable that calculates the difference between the
        last call and the current call of performWork or rather shipALogChunk(forceFlush also)

        >3. You mention a ShippingDeamon, but it is not described what it is or
        > how it works. Is this a new thread that you will introduce or will
        > the LogShipper be run in the existing background thread?

        Sorry about being ambiguous here. ShippingDaemon is Derby's DaemonService that is booted
        up and is not a thread I introduce.

        Show
        V.Narayanan added a comment - >1. If I understand you correctly, log records will be shipped either > when the log buffer is full or when a timeout occur. If the traffic > is high so that the time it takes to fill the log buffer is > normally less than the timeout, this means that log will only be > sent when the log buffer is full. Would it not be better to try to > keep the log shipping ahead so that it never goes full? What we can do is we start with a default value for transmit interval. For theory sake let us assume it is 3 seconds. Suppose the condition stated on your comments arise, we maintain a counter that keeps track of the number of times force flush is called, if this counter value exceeds a certain value we can replace the transmit interval with the last transmit interval we obtained. Thus this would automatically take care of the case of heavy load. A similar check should run in performWork to take care of high load. Both these counters should be reset when the other method is called. The counter for forceFlush is reset when performWork is called and vice-versa. I however do not want to do this in the first version of the patch for this issue. I will submit the first version assuming a static time interval, keep this issue open and change it in the next version if this is OK with the community. I feel this will put in the essential features first and the improvements later. >2. It seems like you suggest that the shipping thread will continously > check to see whether the timeout has expired. Generally, such > busy-waits should be avoided. I think you should introduce a timer > or something so that the thread can be suspended until it is time > to ship log records. I have introduced the time concept to overcome the limitation imposed by DaemonFactory. The DaemonFactory would call performWork (i.e.) ship log records at its own convinience. As i understand it we cannot configure DaemonFactory to transmit at our mentioned time interval. So, When performWork is called entirely depends on how good DaemonFactory is feeling We do not want this to happen, because what if this is too soon for us or what if it is too late. Hence we introduce a transmit interval variable that calculates the difference between the last call and the current call of performWork or rather shipALogChunk(forceFlush also) >3. You mention a ShippingDeamon, but it is not described what it is or > how it works. Is this a new thread that you will introduce or will > the LogShipper be run in the existing background thread? Sorry about being ambiguous here. ShippingDaemon is Derby's DaemonService that is booted up and is not a thread I introduce.
        Hide
        Øystein Grøvlen added a comment -

        Thanks for the desgin overview, Narayanan. I have a few questions
        around the scheduling of the shipping:

        1. If I understand you correctly, log records will be shipped either
        when the log buffer is full or when a timeout occur. If the traffic
        is high so that the time it takes to fill the log buffer is
        normally less than the timeout, this means that log will only be
        sent when the log buffer is full. Would it not be better to try to
        keep the log shipping ahead so that it never goes full?

        2. It seems like you suggest that the shipping thread will continously
        check to see whether the timeout has expired. Generally, such
        busy-waits should be avoided. I think you should introduce a timer
        or something so that the thread can be suspended until it is time
        to ship log records.

        3. You mention a ShippingDeamon, but it is not described what it is or
        how it works. Is this a new thread that you will introduce or will
        the LogShipper be run in the existing background thread?

        Show
        Øystein Grøvlen added a comment - Thanks for the desgin overview, Narayanan. I have a few questions around the scheduling of the shipping: 1. If I understand you correctly, log records will be shipped either when the log buffer is full or when a timeout occur. If the traffic is high so that the time it takes to fill the log buffer is normally less than the timeout, this means that log will only be sent when the log buffer is full. Would it not be better to try to keep the log shipping ahead so that it never goes full? 2. It seems like you suggest that the shipping thread will continously check to see whether the timeout has expired. Generally, such busy-waits should be avoided. I think you should introduce a timer or something so that the thread can be suspended until it is time to ship log records. 3. You mention a ShippingDeamon, but it is not described what it is or how it works. Is this a new thread that you will introduce or will the LogShipper be run in the existing background thread?
        Hide
        V.Narayanan added a comment -

        In keeping with the design diagrams introduced in Derby-2872 I have modified the
        LogShipper design. Pls find below a basic description of the changes that will be
        introduced in this issue.

        LogShipper Design
        -----------------

        As part of this work the following changes are required

        The AsynchronousLogShipper that enables periodic and force flushing of the Log Buffer
        is needed.

        The MasterController is modified to start up the LogShipping module
        and to call a flush on the buffer when the LogBufferFullException is thrown.

        AsynchronousLogShipper
        ----------------------

        • Handles periodic shipping of log chunks from the log buffer to the
          slave
        • Allows for force flushing of the log Buffer by the MasterController
          when the MasterController gets a LogBufferFullException from the
          LogBuffer.

        Design of the class
        -------------------

        Implements interfaces - Serviceable, LogShipper

        Constructor
        -----------

        Accepts the ReplicationLogBuffer, shippingDaemon and the transmitInterval.

        Subscribes to the ShippingDaemon.

        Methods
        -------

        shipALogChunk
        -------------

        Checks to see if a log record chunk is available in the logBuffer for transmission

        If yes removes this log record chunk and transmits it to the slave.

        updates the lastTransmitTime to reflect the current time at which the transmit was done.

        flushedInstance
        ---------------

        LogShipper interface method.

        Used to update the latest instance of the log record that has been flushed to the disk.

        This method will not have any implementation.

        forceFlush
        ----------

        Forces a chunk of the log record from the log buffer to be sent to the
        slave, freeing space in the log buffer.

        calls shipALogChunk to ship the log chunk from the buffer.

        serviceASAP
        -----------

        Serviceable interface method. Returns false.

        serviceImmediately
        ------------------

        Serviceable interface method. Returns false.

        performWork
        -----------

        Gets the current time and subtracts the lastTransmitTime from it to get the time interval
        between transmissions.

        Checks to see if the time interval calculate above is greater than or equal to the
        transmitInterval. If yes calls shipALogChunk. If no the method returns without doing anything.

        Changes to the MasterController class
        -------------------------------------

        startMaster
        -----------

        The LogShipper is initialized here.

        appendLogRecord
        ---------------

        when a LogBufferFullException is thrown we call
        AsynchronousLogShipper.forceFlush

        Show
        V.Narayanan added a comment - In keeping with the design diagrams introduced in Derby-2872 I have modified the LogShipper design. Pls find below a basic description of the changes that will be introduced in this issue. LogShipper Design ----------------- As part of this work the following changes are required The AsynchronousLogShipper that enables periodic and force flushing of the Log Buffer is needed. The MasterController is modified to start up the LogShipping module and to call a flush on the buffer when the LogBufferFullException is thrown. AsynchronousLogShipper ---------------------- Handles periodic shipping of log chunks from the log buffer to the slave Allows for force flushing of the log Buffer by the MasterController when the MasterController gets a LogBufferFullException from the LogBuffer. Design of the class ------------------- Implements interfaces - Serviceable, LogShipper Constructor ----------- Accepts the ReplicationLogBuffer, shippingDaemon and the transmitInterval. Subscribes to the ShippingDaemon. Methods ------- shipALogChunk ------------- Checks to see if a log record chunk is available in the logBuffer for transmission If yes removes this log record chunk and transmits it to the slave. updates the lastTransmitTime to reflect the current time at which the transmit was done. flushedInstance --------------- LogShipper interface method. Used to update the latest instance of the log record that has been flushed to the disk. This method will not have any implementation. forceFlush ---------- Forces a chunk of the log record from the log buffer to be sent to the slave, freeing space in the log buffer. calls shipALogChunk to ship the log chunk from the buffer. serviceASAP ----------- Serviceable interface method. Returns false. serviceImmediately ------------------ Serviceable interface method. Returns false. performWork ----------- Gets the current time and subtracts the lastTransmitTime from it to get the time interval between transmissions. Checks to see if the time interval calculate above is greater than or equal to the transmitInterval. If yes calls shipALogChunk. If no the method returns without doing anything. Changes to the MasterController class ------------------------------------- startMaster ----------- The LogShipper is initialized here. appendLogRecord --------------- when a LogBufferFullException is thrown we call AsynchronousLogShipper.forceFlush
        V.Narayanan made changes -
        Assignee V.Narayanan [ narayanan ]
        Hide
        Jørgen Løland added a comment -

        Thank you for addressing this issue, Narayanan.

        I had a look at the preliminary patch and have some questions.

        1. I think "Factory" is used for object creation in cases where the object creation work is hidden from the caller. This is the case for bootable services since the boot method does this masking. In the log shipping case, however, I don't think "Factory" is a good name. Maybe you could just call the interface (Replication)LogShipper?

        2. The package names are very long. Since the package already contains "replication.master", maybe we could shave off some characters in the class names? "Replication" could, e.g., be removed. Just a thought.

        3. Since you already have commented the variables, I think you should add another * (/* -> /**) so that the comments become javadoc

        4. I think the Daemon should be created by the MasterController since that class will be managing everything on the master side

        5. AsynchLogShipper does not need to know about the transmitter since Forced... does the log shipping. The same goes for the log buffer.

        6. How do you intend to create the log shipping service? I see that Forced... implements the interface, but that the creator of Asynch... creates a Forced... object. In MasterController, we would want to create an instance of the interface. Maybe Asynch could extend Forced? Again, just a thought. With that design, all replication strategy specific methods like Forced#flushedInstance could be made abstract, and adding other replication strategies (like 2-safe) later would be really simple.

        7. The javadoc for forceFlush is not describing the behavior of the method

        8. I am concerned about the eternal loop in performWork. I guess this will be way too active, especially in low workload cases. Also, control should be returned from the performWork method. It should be up to the DaemonService to determine when it wants to ship log. A slight improvement could be to let shipALogChunk returned the boolean return from LogBuffer#next, and change performWork to look something like this:

        performWork() {
        boolean sentSomething = true;
        while (sentSomething)

        { sentSomething = forcedLogShipper.shipALogChunk(); }

        }

        or even

        performWork() {
        if (currentTimeMillis >= (timeOfLastLogShipment + shipmentWaitTime) {
        boolean sentSomething = true;
        while (sentSomething)

        { sentSomething = forcedLogShipper.shipALogChunk(); }

        }
        }

        Show
        Jørgen Løland added a comment - Thank you for addressing this issue, Narayanan. I had a look at the preliminary patch and have some questions. 1. I think "Factory" is used for object creation in cases where the object creation work is hidden from the caller. This is the case for bootable services since the boot method does this masking. In the log shipping case, however, I don't think "Factory" is a good name. Maybe you could just call the interface (Replication)LogShipper? 2. The package names are very long. Since the package already contains "replication.master", maybe we could shave off some characters in the class names? "Replication" could, e.g., be removed. Just a thought. 3. Since you already have commented the variables, I think you should add another * (/* -> /**) so that the comments become javadoc 4. I think the Daemon should be created by the MasterController since that class will be managing everything on the master side 5. AsynchLogShipper does not need to know about the transmitter since Forced... does the log shipping. The same goes for the log buffer. 6. How do you intend to create the log shipping service? I see that Forced... implements the interface, but that the creator of Asynch... creates a Forced... object. In MasterController, we would want to create an instance of the interface. Maybe Asynch could extend Forced? Again, just a thought. With that design, all replication strategy specific methods like Forced#flushedInstance could be made abstract, and adding other replication strategies (like 2-safe) later would be really simple. 7. The javadoc for forceFlush is not describing the behavior of the method 8. I am concerned about the eternal loop in performWork. I guess this will be way too active, especially in low workload cases. Also, control should be returned from the performWork method. It should be up to the DaemonService to determine when it wants to ship log. A slight improvement could be to let shipALogChunk returned the boolean return from LogBuffer#next, and change performWork to look something like this: performWork() { boolean sentSomething = true; while (sentSomething) { sentSomething = forcedLogShipper.shipALogChunk(); } } or even performWork() { if (currentTimeMillis >= (timeOfLastLogShipment + shipmentWaitTime) { boolean sentSomething = true; while (sentSomething) { sentSomething = forcedLogShipper.shipALogChunk(); } } }
        V.Narayanan made changes -
        Field Original Value New Value
        Attachment LogShipperImpl_v1.stat [ 12365468 ]
        Attachment LogShipperImpl_v1.diff [ 12365467 ]
        Hide
        V.Narayanan added a comment -

        This patch contains a preliminary implementation of the LogShipper.
        The patch depends on the Network code (Derby-2921) and can be considered
        for a commit only after the Network code is accepted.

        Please find below a File by File explanation of the LogShipper code

        A java/engine/org/apache/derby/impl/services/replication/master/ReplicationAsynchronousLogShipper.java

        This class creates a DaemonService that periodically sends the log record chunks to
        the slave. Since it uses the DaemonService by subcribing to it the interval at which
        the LogChunks are sent depends on the DaemonService.

        A java/engine/org/apache/derby/impl/services/replication/master/ReplicationForcedLogShipper.java

        This class allows you to send a chunk of log from the buffer thus freeing up
        the buffer. The ReplicationFOrcedLogShipper and the ReplicationAsynchronousLogShipper
        are synchronized on the LogBuffer.

        A java/engine/org/apache/derby/iapi/services/replication/master/ReplicationLogShippingFactory.java

        This is the common interface to the ReplicationForcedLogShipper that will allow the code to
        request the freeing of the Log buffer and enable a Log chunk transfer.

        Show
        V.Narayanan added a comment - This patch contains a preliminary implementation of the LogShipper. The patch depends on the Network code (Derby-2921) and can be considered for a commit only after the Network code is accepted. Please find below a File by File explanation of the LogShipper code A java/engine/org/apache/derby/impl/services/replication/master/ReplicationAsynchronousLogShipper.java This class creates a DaemonService that periodically sends the log record chunks to the slave. Since it uses the DaemonService by subcribing to it the interval at which the LogChunks are sent depends on the DaemonService. A java/engine/org/apache/derby/impl/services/replication/master/ReplicationForcedLogShipper.java This class allows you to send a chunk of log from the buffer thus freeing up the buffer. The ReplicationFOrcedLogShipper and the ReplicationAsynchronousLogShipper are synchronized on the LogBuffer. A java/engine/org/apache/derby/iapi/services/replication/master/ReplicationLogShippingFactory.java This is the common interface to the ReplicationForcedLogShipper that will allow the code to request the freeing of the Log buffer and enable a Log chunk transfer.
        Hide
        V.Narayanan added a comment -

        The Log shipper will be in charge of shipping the Log Record
        chunks from the master to the slave. The shipping needs to
        happen Asynchronously as well as in a forced manner. An example
        of a situation where the forced shipping would be needed would
        be when the Log buffer becomes full and it needs to be emptied
        to enable addition of further log records.

        Show
        V.Narayanan added a comment - The Log shipper will be in charge of shipping the Log Record chunks from the master to the slave. The shipping needs to happen Asynchronously as well as in a forced manner. An example of a situation where the forced shipping would be needed would be when the Log buffer becomes full and it needs to be emptied to enable addition of further log records.
        V.Narayanan created issue -

          People

          • Assignee:
            V.Narayanan
            Reporter:
            V.Narayanan
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development