Derby
  1. Derby
  2. DERBY-2922 Replication: Add master replication mode
  3. DERBY-2926

Replication: Add a log buffer for log records that should be shipped to the slave

    Details

    • Type: Sub-task Sub-task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.1.3
    • Fix Version/s: 10.4.1.3
    • Component/s: Services
    • Labels:
      None

      Description

      When a Derby instance has the master role for a database, log records are shipped to the slave to keep it up to date. A buffer is needed because the log records should not be shipped one at a time. Also, writing the log records to a buffer instead of sending them immediately removes the network communication from the critical path for the transaction.

      1. bytebuffer_resizefix_1.diff
        2 kB
        Jørgen Løland
      2. bytebuffer_resizefix_1.stat
        0.1 kB
        Jørgen Løland
      3. bytebuffer_v2b.diff
        23 kB
        Jørgen Løland
      4. bytebuffer_v2b.stat
        0.4 kB
        Jørgen Løland
      5. bytebuffer_v2.diff
        30 kB
        Jørgen Løland
      6. bytebuffer_v2.stat
        0.4 kB
        Jørgen Løland
      7. bytebuffer_v1-fixheader.stat
        0.4 kB
        Jørgen Løland
      8. bytebuffer_v1-fixheader.diff
        17 kB
        Jørgen Løland
      9. bytebuffer_v1a.stat
        0.3 kB
        Jørgen Løland
      10. bytebuffer_v1a.diff
        16 kB
        Jørgen Løland
      11. bytebuffer_v1.stat
        0.3 kB
        Jørgen Løland
      12. bytebuffer_v1.diff
        16 kB
        Jørgen Løland

        Issue Links

          Activity

          Hide
          Knut Anders Hatlen added a comment -

          Thanks Jørgen! Committed revision 569014.

          Show
          Knut Anders Hatlen added a comment - Thanks Jørgen! Committed revision 569014.
          Hide
          Jørgen Løland added a comment - - edited

          I agree - the change you suggest would both make the code easier to read and be more efficient. The attached patch, resizefix_1, addresses this issue.

          Show
          Jørgen Løland added a comment - - edited I agree - the change you suggest would both make the code easier to read and be more efficient. The attached patch, resizefix_1, addresses this issue.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the new patch! Committed revision 568121.

          Show
          Knut Anders Hatlen added a comment - Thanks for the new patch! Committed revision 568121.
          Hide
          Knut Anders Hatlen added a comment -

          You're too quick!

          Show
          Knut Anders Hatlen added a comment - You're too quick!
          Hide
          Knut Anders Hatlen added a comment -

          One more question related to #14:

          It seems the code reallocates the buffer as long as its length is not equal to the default size, even if the new buffer will be of the exact same size as the old one (I was thinking this could happen if you for instance update a LOB, so that you get a large number of consecutive log records with identical size, but larger than default size). I think this simplified code would behave the same way and save buffer allocation in those cases:

          int requiredSize = Math.max(defaultBufferSize, current.size());
          if (outBufferData.length != requiredSize)

          { outBufferData = new byte[requiredSize]; }
          Show
          Knut Anders Hatlen added a comment - One more question related to #14: It seems the code reallocates the buffer as long as its length is not equal to the default size, even if the new buffer will be of the exact same size as the old one (I was thinking this could happen if you for instance update a LOB, so that you get a large number of consecutive log records with identical size, but larger than default size). I think this simplified code would behave the same way and save buffer allocation in those cases: int requiredSize = Math.max(defaultBufferSize, current.size()); if (outBufferData.length != requiredSize) { outBufferData = new byte[requiredSize]; }
          Hide
          Jørgen Løland added a comment -

          Attached patch v2b replaces v2. Knut's comments are addressed.

          Show
          Jørgen Løland added a comment - Attached patch v2b replaces v2. Knut's comments are addressed.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for the new patch Jørgen. Your comments sound reasonable to me. Some new comments:

          • I assume the changes to index.html should be discarded.
          • ReplicationLogBuffer.validData() is synchronized on this. Did you mean outputLatch?
          • ReplicationLogBuffer.switchDirtyBuffer() mentions synchronized(this) in its javadoc. Should it say listLatch?
          Show
          Knut Anders Hatlen added a comment - Thanks for the new patch Jørgen. Your comments sound reasonable to me. Some new comments: I assume the changes to index.html should be discarded. ReplicationLogBuffer.validData() is synchronized on this. Did you mean outputLatch? ReplicationLogBuffer.switchDirtyBuffer() mentions synchronized(this) in its javadoc. Should it say listLatch?
          Hide
          Jørgen Løland added a comment -

          Attaching patch v2, replacing previous patches.

          Thanks for reviewing the patch, Knut. In v2 I have addressed most of your comments:

          Fixed: 1, 2, 3, 4, 5, 6, 7 (partially), 8, 9, 10, 13

          Comments:
          --------
          7: I cannot think of a situation where an application running j2me would want to run replication. However, I do not think it is a good idea to block this possibility at this point just to save 3 methods. I have therefore not removed these methods.

          11: You are right that Arrays.copyOf would be simpler. As far as I can see, however, these were introduced in Java 1.6 and can therefore not be used.

          12: I think the exception will only be seen by the master replication module. The exception can be moved to iapi later if this assumption does not hold.

          14: One log record could have the size of two whole pages (do and undo information for a whole page) + log record overhead. This could potentially be much larger than the default LogBufferElement size. I would therefore prefer to keep the code as it is.

          Also fixed:
          ----------

          • Synchronization on two different objects in ReplicationLogBuffer so that the logger can append log records and the log consumer can read chunks of log at the same time.
          • ReplicationLogBuffer.switchDirtyBuffer is no longer synchronized since all uses of it is already synchronized. Also, the method is modified to move the currentDirtyBuffer to dirtyBuffers even if freeBuffers.size == 0
          • Clearification of comments
          Show
          Jørgen Løland added a comment - Attaching patch v2, replacing previous patches. Thanks for reviewing the patch, Knut. In v2 I have addressed most of your comments: Fixed: 1, 2, 3, 4, 5, 6, 7 (partially), 8, 9, 10, 13 Comments: -------- 7: I cannot think of a situation where an application running j2me would want to run replication. However, I do not think it is a good idea to block this possibility at this point just to save 3 methods. I have therefore not removed these methods. 11: You are right that Arrays.copyOf would be simpler. As far as I can see, however, these were introduced in Java 1.6 and can therefore not be used. 12: I think the exception will only be seen by the master replication module. The exception can be moved to iapi later if this assumption does not hold. 14: One log record could have the size of two whole pages (do and undo information for a whole page) + log record overhead. This could potentially be much larger than the default LogBufferElement size. I would therefore prefer to keep the code as it is. Also fixed: ---------- Synchronization on two different objects in ReplicationLogBuffer so that the logger can append log records and the log consumer can read chunks of log at the same time. ReplicationLogBuffer.switchDirtyBuffer is no longer synchronized since all uses of it is already synchronized. Also, the method is modified to move the currentDirtyBuffer to dirtyBuffers even if freeBuffers.size == 0 Clearification of comments
          Hide
          Knut Anders Hatlen added a comment -

          Hi Jørgen,

          I had a look at your patch (v1-fixheader). Please see my comments
          below.

          1) I think it would be good if there were some class-level javadoc
          comments explaining the purpose of each class. For instance, it is not
          clear to me what a class named "LogBufferReplication" is meant to do
          (I noticed that you mentioned a class called ReplicationLogBuffer in a
          JIRA comment. If that's the same class, it sounds like a better name
          to me.) Also, javadoc comments for the methods (at least the public
          ones) would be good.

          2) I got this warning when I built the javadoc:

          [javadoc] /export/home/kh160127/derby/trunk/java/engine/org/apache/derby/impl/services/replication/buffer/LogBufferReplication.java:103: warning - @param argument "data_offset" is not a parameter name.

          3) If the value of a field is not meant to change during the lifetime
          of an object, I find it very useful to mark them as such by declaring
          them final. Declaring them final serves both as documentation and as
          an extra compile-time error check (and I have heard people saying it
          helps the garbage collector as well). These fields could be final:

          LogBufferReplication
          dirtyBuffers
          freeBuffers
          defaultBufferSize

          LogBufferElement
          bufferdata
          bufferSize

          4) The fields LogBufferReplication.outBufferCapacity and
          LogBufferElement.bufferSize are redundant since they are always equal
          to outBufferData.length and bufferdata.length.

          5) Is the LogBufferElement class supposed to be accessed directly by
          classes in other packages? If not, you could remove the public
          modifier in the class definition.

          6) I think a better name for LogBufferElement.writeByte() would be
          writeBytes(), since it writes an array, not a single byte.

          7) LogBufferElement.writeInt() and LogBufferElement.writeLong()
          perform unnecessary casting and masking of intermediate results. I
          think writeLong() could be simplified to:

          bufferdata[p++] = (byte) (l >> 56);
          .
          .
          .
          bufferdata[p++] = (byte) (l >> 8);
          bufferdata[p++] = (byte) l;

          As a side note, if this code can use java.nio.ByteBuffer (it can't if
          it's supposed to run under J2ME), I would recommend switching to it as
          it has helper methods which do exactly the same. The number of Derby
          classes with their own methods for big-endian encoding of integers is
          high enough already...

          8) LogBufferReplication.next() has an empty catch block. What about
          adding SanityManager.THROWASSERT?

          9) I think I would have renamed setRecycle() and recycle() to
          setRecyclable() and isRecyclable() (when I saw the call to recycle() I
          first thought it was a command, not a getter method). Perhaps the
          setRecycle() method even could be removed and instead we could pass
          the value directly to the constructor?

          10) LogBufferReplication.validData() is declared to throw
          NoSuchElementException, but it never throws anything.

          11) LogBufferReplication.getData() could be simplified by using
          Arrays.copyOf().

          12) You asked whether it was OK with internal exceptions like
          LogBufferFullException. I think is, but if the exception is supposed
          to be caught by other modules (I don't know if that's how it will be
          used) perhaps it should be located in one of the iapi packages instead
          of impl?

          13) I noticed some use of synchronization in
          LogBufferReplication. Could you add a short comment in the class
          javadoc stating the synchronization requirements?

          14) In LogBufferReplication.next(), could we skip the shrinking of
          outBufferData? Unless the buffer can become very large, I think we
          could just skip it. That would simplify the code and also reduce the
          need for reallocation if there's a non-default buffer size at a later
          point in time.

          Show
          Knut Anders Hatlen added a comment - Hi Jørgen, I had a look at your patch (v1-fixheader). Please see my comments below. 1) I think it would be good if there were some class-level javadoc comments explaining the purpose of each class. For instance, it is not clear to me what a class named "LogBufferReplication" is meant to do (I noticed that you mentioned a class called ReplicationLogBuffer in a JIRA comment. If that's the same class, it sounds like a better name to me.) Also, javadoc comments for the methods (at least the public ones) would be good. 2) I got this warning when I built the javadoc: [javadoc] /export/home/kh160127/derby/trunk/java/engine/org/apache/derby/impl/services/replication/buffer/LogBufferReplication.java:103: warning - @param argument "data_offset" is not a parameter name. 3) If the value of a field is not meant to change during the lifetime of an object, I find it very useful to mark them as such by declaring them final. Declaring them final serves both as documentation and as an extra compile-time error check (and I have heard people saying it helps the garbage collector as well). These fields could be final: LogBufferReplication dirtyBuffers freeBuffers defaultBufferSize LogBufferElement bufferdata bufferSize 4) The fields LogBufferReplication.outBufferCapacity and LogBufferElement.bufferSize are redundant since they are always equal to outBufferData.length and bufferdata.length. 5) Is the LogBufferElement class supposed to be accessed directly by classes in other packages? If not, you could remove the public modifier in the class definition. 6) I think a better name for LogBufferElement.writeByte() would be writeBytes(), since it writes an array, not a single byte. 7) LogBufferElement.writeInt() and LogBufferElement.writeLong() perform unnecessary casting and masking of intermediate results. I think writeLong() could be simplified to: bufferdata [p++] = (byte) (l >> 56); . . . bufferdata [p++] = (byte) (l >> 8); bufferdata [p++] = (byte) l; As a side note, if this code can use java.nio.ByteBuffer (it can't if it's supposed to run under J2ME), I would recommend switching to it as it has helper methods which do exactly the same. The number of Derby classes with their own methods for big-endian encoding of integers is high enough already... 8) LogBufferReplication.next() has an empty catch block. What about adding SanityManager.THROWASSERT? 9) I think I would have renamed setRecycle() and recycle() to setRecyclable() and isRecyclable() (when I saw the call to recycle() I first thought it was a command, not a getter method). Perhaps the setRecycle() method even could be removed and instead we could pass the value directly to the constructor? 10) LogBufferReplication.validData() is declared to throw NoSuchElementException, but it never throws anything. 11) LogBufferReplication.getData() could be simplified by using Arrays.copyOf(). 12) You asked whether it was OK with internal exceptions like LogBufferFullException. I think is, but if the exception is supposed to be caught by other modules (I don't know if that's how it will be used) perhaps it should be located in one of the iapi packages instead of impl? 13) I noticed some use of synchronization in LogBufferReplication. Could you add a short comment in the class javadoc stating the synchronization requirements? 14) In LogBufferReplication.next(), could we skip the shrinking of outBufferData? Unless the buffer can become very large, I think we could just skip it. That would simplify the code and also reduce the need for reallocation if there's a non-default buffer size at a later point in time.
          Hide
          Jørgen Løland added a comment -

          Patch v1-fixheader replaces v1a. The headers now match the actual package locations and class names

          Show
          Jørgen Løland added a comment - Patch v1-fixheader replaces v1a. The headers now match the actual package locations and class names
          Hide
          Rick Hillegas added a comment -

          Small comment on this patch: The class name in the header boilerplate does not agree with the actual package and classname of the file.

          Show
          Rick Hillegas added a comment - Small comment on this patch: The class name in the header boilerplate does not agree with the actual package and classname of the file.
          Hide
          Jørgen Løland added a comment -

          Patch v1a replaces patch v1. It removes two methods that were used in testing, but is otherwise equal.

          Show
          Jørgen Løland added a comment - Patch v1a replaces patch v1. It removes two methods that were used in testing, but is otherwise equal.
          Hide
          Jørgen Løland added a comment -

          Attachment: bytebuffer_v1.*

          The attachment contains a buffer implemented as

          LinkedList freeBuffers
          LinkedList dirtyBuffers

          where each buffer is a byte[]. The buffer works similar to LogAccessFile, and reuses some code from that file.

          The patch includes an exception. The exception (LogBufferFullException) will only be used internally in the replication code. When caught by the log shipping code (to be written), the log shipper will have to decide what to do. Some examples: stop replication, stop replication but store log on disk for later restart, increase the buffer size, try to flush the buffer to see if that helps, and so on. The exception will never be thrown outside the replication code.

          I have two minor questions.

          • I have not seen many internal exceptions in Derby. Is it ok for me to add one? I think the "LogBufferFullException" provides valuable information...
          • I have placed the patch in java/engine/org/apache/derby/impl/services/replication/buffer/. Future replication patches are intended to be added to the same path (- "buffer/"). Is that ok?

          The buffer is not used anywhere in the Derby code yet. Later replication patches will start using it. I ran derbyall and suites.all. These failed with the same number of fail/errors as reported by tinderbox. Unfortunately, the tinderbox report site is down for the moment. Since the code is not used for now, I expect it to not be the cause of the failures.

          Show
          Jørgen Løland added a comment - Attachment: bytebuffer_v1.* The attachment contains a buffer implemented as LinkedList freeBuffers LinkedList dirtyBuffers where each buffer is a byte[]. The buffer works similar to LogAccessFile, and reuses some code from that file. The patch includes an exception. The exception (LogBufferFullException) will only be used internally in the replication code. When caught by the log shipping code (to be written), the log shipper will have to decide what to do. Some examples: stop replication, stop replication but store log on disk for later restart, increase the buffer size, try to flush the buffer to see if that helps, and so on. The exception will never be thrown outside the replication code. I have two minor questions. I have not seen many internal exceptions in Derby. Is it ok for me to add one? I think the "LogBufferFullException" provides valuable information... I have placed the patch in java/engine/org/apache/derby/impl/services/replication/buffer/. Future replication patches are intended to be added to the same path (- "buffer/"). Is that ok? The buffer is not used anywhere in the Derby code yet. Later replication patches will start using it. I ran derbyall and suites.all. These failed with the same number of fail/errors as reported by tinderbox. Unfortunately, the tinderbox report site is down for the moment. Since the code is not used for now, I expect it to not be the cause of the failures.
          Hide
          Jørgen Løland added a comment -

          >How do you imagine flow control if the network gets slow? Would you
          >block a transaction whose record would overflow the buffer?

          There are at least two simple alternatives for how to handle a full replication buffer:

          • Stop replication
          • Block transactions

          I think the first alternative would be the better one since blocking transactions would mean no availability. This is the exact opposite of what we want to achieve with replication.

          The functional spec of DERBY-2872 states that resuming replication after it has been stopped is a good candidate for extending the functionality. Once that issue has been addressed, we have a third alternative if the buffer gets full:

          • Stop replication for now, but store the log files so that replication can be resumed later.
          Show
          Jørgen Løland added a comment - >How do you imagine flow control if the network gets slow? Would you >block a transaction whose record would overflow the buffer? There are at least two simple alternatives for how to handle a full replication buffer: Stop replication Block transactions I think the first alternative would be the better one since blocking transactions would mean no availability. This is the exact opposite of what we want to achieve with replication. The functional spec of DERBY-2872 states that resuming replication after it has been stopped is a good candidate for extending the functionality. Once that issue has been addressed, we have a third alternative if the buffer gets full: Stop replication for now, but store the log files so that replication can be resumed later.
          Hide
          Jørgen Løland added a comment -

          >Another approach here would be to use a large circular byte buffer and
          >administer space in it yourself.

          Thank you for the link, Dag.

          The initial log buffer plan was to make a log buffer of linked
          list LogElements, each containing exactly one Derby log record.
          Simple to write, but will generate lots of small, short-lived
          objects.

          Based on the comments from Mike and Dag, I see that we have an
          alternative strategy in reusing code from LogAccessFile. Although
          I am not sure if LogAccessFile can be used as a log buffer as it
          is, we can reuse much of the code. Basically, this would mean
          that the log buffer is written as a circular byte buffer. The
          slave will have to unserialize the byte buffers to get the log
          records, but that should be fairly easy.

          The byte buffer strategy requires far less objects since each
          buffer can contain many log records. Furthermore, reusing
          LogAccessFile means I don't have to add a separate "beast",
          although some modifications may be required. Finally, LogToFile
          can still be the place where log records are added to the buffer.
          This is nice because of the single point of entry considerations.

          The alternative strategy seems to be better than the original
          one. I will give that a try.

          Show
          Jørgen Løland added a comment - >Another approach here would be to use a large circular byte buffer and >administer space in it yourself. Thank you for the link, Dag. The initial log buffer plan was to make a log buffer of linked list LogElements, each containing exactly one Derby log record. Simple to write, but will generate lots of small, short-lived objects. Based on the comments from Mike and Dag, I see that we have an alternative strategy in reusing code from LogAccessFile. Although I am not sure if LogAccessFile can be used as a log buffer as it is, we can reuse much of the code. Basically, this would mean that the log buffer is written as a circular byte buffer. The slave will have to unserialize the byte buffers to get the log records, but that should be fairly easy. The byte buffer strategy requires far less objects since each buffer can contain many log records. Furthermore, reusing LogAccessFile means I don't have to add a separate "beast", although some modifications may be required. Finally, LogToFile can still be the place where log records are added to the buffer. This is nice because of the single point of entry considerations. The alternative strategy seems to be better than the original one. I will give that a try.
          Hide
          Dag H. Wanvik added a comment -

          > * It will be very easy to recycle the ReplicationLogRecord objects
          > that make up the linked list. Once the log-information in an object
          > has been shipped to the slave, the object could be put in a pool of
          > recycled objects. This would significantly reduce the number of
          > ReplicationLogRecord objects that must be created and garbage
          > collected, but may increase the memory usage since the objects in
          > the pool are not removed from memory. *Is recycling considered good
          > or bad practice?*

          At this years Java One, the Sun JVM garbage collector people talked a
          lot about how cheap object creation and garbage collection is iff:

          • objects are short-lived
          • objects are read-only (use final if possible!)
          • objects are short

          (see http://developers.sun.com/learning/javaoneonline/2007/pdf/TS-2906.pdf)

          So whether recycling will be good depends on the nature of the
          objects. Perhaps a micro benchmark may be useful to determine
          this. Another approach here would be to use a large circular byte buffer and
          administer space in it yourself.

          How do you imagine flow control if the network gets slow? Would you
          block a transaction whose record would overflow the buffer?

          Show
          Dag H. Wanvik added a comment - > * It will be very easy to recycle the ReplicationLogRecord objects > that make up the linked list. Once the log-information in an object > has been shipped to the slave, the object could be put in a pool of > recycled objects. This would significantly reduce the number of > ReplicationLogRecord objects that must be created and garbage > collected, but may increase the memory usage since the objects in > the pool are not removed from memory. *Is recycling considered good > or bad practice?* At this years Java One, the Sun JVM garbage collector people talked a lot about how cheap object creation and garbage collection is iff: objects are short-lived objects are read-only (use final if possible!) objects are short (see http://developers.sun.com/learning/javaoneonline/2007/pdf/TS-2906.pdf ) So whether recycling will be good depends on the nature of the objects. Perhaps a micro benchmark may be useful to determine this. Another approach here would be to use a large circular byte buffer and administer space in it yourself. How do you imagine flow control if the network gets slow? Would you block a transaction whose record would overflow the buffer?
          Hide
          Jørgen Løland added a comment -

          Hi Mike

          There is a better description of what we intend to create in DERBY-2872. Jira does not allow multiple layers of subtasks, so I had to use the "is part of" link.

          Basically, the goal is to provide asynchronous replication, i.e. replication where log shipment to the slave is completely decoupled from transactions on the master. This is even looser synchronization than your third alternative, which I believe is also known as 1-safe replication.

          The asynchronous replication strategy may result in some lost transactions when the slave performs fail-over. The amount of lost transactions is, of course, closely related to how often log shipment is performed. As you mention, there is a trade off between how tight the master/slave synchronization is and the incurred performance degradation. Since log shipping in asynchronous replication is completely decoupled from the transactions, this strategy should have less performance impact than the alternatives.

          Although the plan is to add asynchronous replication now, replication with tighter synchronization should be kept in mind. If possible within reasonable increased work, the architecture should easily extend to 1-safe or 2-safe (your second alternative) replication later.

          When it comes to the core replication functionality, there are two things the replication master must know about: 1) log writes and 2) log flush. The log records must (sooner or later) be sent to the slave, hence 1). What to do with flush calls is up to the replication strategy. For the planned asynchronous replication, flush calls can be ignored. For 1-safe and 2-safe replication, flush calls require log shipment as you describe in alternatives 2 and 3.

          The current plan for using the log buffer is to append log records to it somewhere in LogToFile.appendLogRecord. This is the same method used to append log records to logOut (output stream to the log file; class type LogAccessFile). LogAccessFile is implemented with a number byte[] buffers (LogAccessFileBuffer), which are ordered in a linked list. LogToFile is the only entry point for log writes, and is therefore easily modifiable for our purpose.

          At the other end of the log buffer, a log shipping service will consume log records. The service should, as you suggest, run as a separate thread. I think DaemonFactory could be useful to create this thread, but that is just a guess.

          In the current code, the flush-methods in LogToFile are the only entry points for transactions to force a log flush (e.g. at commit). Hence, adding forced log shipment to achieve 1 or 2 safe replication later can be easily put in these methods. In the planned asynchronous strategy, log shipment may, e.g., be based on a timeout; flush calls can be ignored altogether.

          I hope this clarifies most of your concerns. Does this architecture fit into your idea of "tie into the existing log writing code" since log records are added to the buffer from the log factory? The single entry point for both log writes and flushes makes this a good place for modifications (for both asynchronous and the x-safe strategies) in my opinion, but there may be good reasons for doing otherwise.

          Show
          Jørgen Løland added a comment - Hi Mike There is a better description of what we intend to create in DERBY-2872 . Jira does not allow multiple layers of subtasks, so I had to use the "is part of" link. Basically, the goal is to provide asynchronous replication, i.e. replication where log shipment to the slave is completely decoupled from transactions on the master. This is even looser synchronization than your third alternative, which I believe is also known as 1-safe replication. The asynchronous replication strategy may result in some lost transactions when the slave performs fail-over. The amount of lost transactions is, of course, closely related to how often log shipment is performed. As you mention, there is a trade off between how tight the master/slave synchronization is and the incurred performance degradation. Since log shipping in asynchronous replication is completely decoupled from the transactions, this strategy should have less performance impact than the alternatives. Although the plan is to add asynchronous replication now, replication with tighter synchronization should be kept in mind. If possible within reasonable increased work, the architecture should easily extend to 1-safe or 2-safe (your second alternative) replication later. When it comes to the core replication functionality, there are two things the replication master must know about: 1) log writes and 2) log flush. The log records must (sooner or later) be sent to the slave, hence 1). What to do with flush calls is up to the replication strategy. For the planned asynchronous replication, flush calls can be ignored. For 1-safe and 2-safe replication, flush calls require log shipment as you describe in alternatives 2 and 3. The current plan for using the log buffer is to append log records to it somewhere in LogToFile.appendLogRecord. This is the same method used to append log records to logOut (output stream to the log file; class type LogAccessFile). LogAccessFile is implemented with a number byte[] buffers (LogAccessFileBuffer), which are ordered in a linked list. LogToFile is the only entry point for log writes, and is therefore easily modifiable for our purpose. At the other end of the log buffer, a log shipping service will consume log records. The service should, as you suggest, run as a separate thread. I think DaemonFactory could be useful to create this thread, but that is just a guess. In the current code, the flush-methods in LogToFile are the only entry points for transactions to force a log flush (e.g. at commit). Hence, adding forced log shipment to achieve 1 or 2 safe replication later can be easily put in these methods. In the planned asynchronous strategy, log shipment may, e.g., be based on a timeout; flush calls can be ignored altogether. I hope this clarifies most of your concerns. Does this architecture fit into your idea of "tie into the existing log writing code" since log records are added to the buffer from the log factory? The single entry point for both log writes and flushes makes this a good place for modifications (for both asynchronous and the x-safe strategies) in my opinion, but there may be good reasons for doing otherwise.
          Hide
          Mike Matrigali added a comment -

          Do you have a writeup on the architecture for replication you are implementing other
          than what is in this JIRA and DERBY-2922. It is hard to comment without understanding
          the architecture you envision.

          Have you considered rather than having a linked list of log records, using the existing functionality to scan the log records as the base for your list of log records to write? Some
          of this depends on what kind of replication guarantee you are trying to provide. In other
          systems I have seen this describes similar to levels of transaction durability, ie. levels
          like (all basically are tradeoffs about how much you impact master side commit
          response time against guaranteeing slave consistency):
          o don't allow transaction to commit until log records are synced to remote system
          o queue write of log records at commit to remote, wait for network reply but not disk sync
          o queue write of log records at commit to remove, don't wait for network reply

          If you are looking for any sort of coordination between transaction commit and guaranteeing
          records on the remote I think I would tie into the existing log writing code rather than add
          a separate beast. Basically just enhance the small piece of code that actually writes log
          records to disk to also call a new routine that would also write log records somewhere
          else. This does mean impacting performance of master response time depending on
          the overhead of the secondary write method. Using multiple threads to do I/O locally and
          remote at same time would probably help a lot.

          Show
          Mike Matrigali added a comment - Do you have a writeup on the architecture for replication you are implementing other than what is in this JIRA and DERBY-2922 . It is hard to comment without understanding the architecture you envision. Have you considered rather than having a linked list of log records, using the existing functionality to scan the log records as the base for your list of log records to write? Some of this depends on what kind of replication guarantee you are trying to provide. In other systems I have seen this describes similar to levels of transaction durability, ie. levels like (all basically are tradeoffs about how much you impact master side commit response time against guaranteeing slave consistency): o don't allow transaction to commit until log records are synced to remote system o queue write of log records at commit to remote, wait for network reply but not disk sync o queue write of log records at commit to remove, don't wait for network reply If you are looking for any sort of coordination between transaction commit and guaranteeing records on the remote I think I would tie into the existing log writing code rather than add a separate beast. Basically just enhance the small piece of code that actually writes log records to disk to also call a new routine that would also write log records somewhere else. This does mean impacting performance of master response time depending on the overhead of the secondary write method. Using multiple threads to do I/O locally and remote at same time would probably help a lot.
          Hide
          Jørgen Løland added a comment -

          Note the questions below...

          PLAN:
          I am planing to write the buffer as a linked list of ReplicationLogRecord objects, each containing the same information that is passed to LogToFile.appendLogRecord:

          byte[] data
          int offset
          int length
          byte[] optionaldata
          int optionalDataOffset
          int optionalDataLength

          This is the same information that is sent in DERBY-2872 using RMI calls. Log records will be appended to the buffer somewhere in the LogFactory, while a log shipping service will remove log records from it. Adding and removing log records from the buffer is not part of this jira.

          QUESTIONS:

          • It will be very easy to recycle the ReplicationLogRecord objects that make up the linked list. Once the log-information in an object has been shipped to the slave, the object could be put in a pool of recycled objects. This would significantly reduce the number of ReplicationLogRecord objects that must be created and garbage collected, but may increase the memory usage since the objects in the pool are not removed from memory. Is recycling considered good or bad practice?
          • Will it be ok to create a new directory for this, e.g. java/engine/org/apache/derby/impl/store/replication/buffer/ ? It is likely that more replication functionality will be added to store later, and /store/replication could then be used for all of this.
          Show
          Jørgen Løland added a comment - Note the questions below... PLAN: I am planing to write the buffer as a linked list of ReplicationLogRecord objects, each containing the same information that is passed to LogToFile.appendLogRecord: byte[] data int offset int length byte[] optionaldata int optionalDataOffset int optionalDataLength This is the same information that is sent in DERBY-2872 using RMI calls. Log records will be appended to the buffer somewhere in the LogFactory, while a log shipping service will remove log records from it. Adding and removing log records from the buffer is not part of this jira. QUESTIONS: It will be very easy to recycle the ReplicationLogRecord objects that make up the linked list. Once the log-information in an object has been shipped to the slave, the object could be put in a pool of recycled objects. This would significantly reduce the number of ReplicationLogRecord objects that must be created and garbage collected, but may increase the memory usage since the objects in the pool are not removed from memory. Is recycling considered good or bad practice? Will it be ok to create a new directory for this, e.g. java/engine/org/apache/derby/impl/store/replication/buffer/ ? It is likely that more replication functionality will be added to store later, and /store/replication could then be used for all of this.

            People

            • Assignee:
              Jørgen Løland
              Reporter:
              Jørgen Løland
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development