Derby
  1. Derby
  2. DERBY-2762

Document, verify and fix synchronization issues related to Clob in the embedded driver

    Details

    • Type: Task Task
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.3.1.4
    • Fix Version/s: None
    • Component/s: JDBC
    • Urgency:
      Normal

      Description

      Synchronization with respect to Clobs is a bit tricky.
      A full review of synchronization should be performed, and documentation and actual behavior must be made in agreement with each other.

      The synchronization wrt. Clob is made tricker due to the many streams that can be used to read or write its value.
      The main classes to review will be (there might be more):
      a) EmbedClob
      b) StoreStreamClob
      c) ClobStreamControl (may be renamed to TemporaryClob)
      d) ClobUtf8Writer
      e) ClobAsciiStream
      f) ClobUpdateableReader

      We should also clarify and document what is supposed to be allowed. Can
      you read from one stream and write to another one at the same time, both
      from the same Clob?
      Can you expect the ascii stream and the character stream to be in sync
      if you read from both of them?

      A related issue is that of garbage collection of underlying resources
      before the streams are closed. DERBY-2734 has already been filed for
      this.

      I would like to try a little experiment by using a few simple
      annotations to document intended synchronization policies. These are the
      annotations defined in the book "Java Concurrency in Practice" by Brian
      Goetz et al, and the JavaDoc for them can be found here:
      http://javaconcurrencyinpractice.com/annotations/doc/index.html

      Since we are still using Java 1.4, the annotations must be used as
      comments. I still think they are valuable, as we do not use any tools to
      document/check synchronization anyway.

      Briefly, the following four annotations are defined:
      @GuardedBy
      @Immutable
      @NotThreadSafe
      @ThreadSafe

        Issue Links

          Activity

          Hide
          Rick Hillegas added a comment -

          Triaged for 10.5.2: Assigned normal urgency. Changed from Bug to Task because this is a request for analysis--it's likely that that analysis will result in the logging of bugs.

          Show
          Rick Hillegas added a comment - Triaged for 10.5.2: Assigned normal urgency. Changed from Bug to Task because this is a request for analysis--it's likely that that analysis will result in the logging of bugs.
          Hide
          Kristian Waagan added a comment -

          Here's my first take at defining a synchronization policy for Clob-objects.
          There are two paths that must be synchronized:
          a) Use through the EmbedClob object.
          b) Reads/writes through one of the available streams.

          For streams, the scenarios below must be handled. My initial propositions for
          what to do is listed as well. I think they are in agreement with the current
          efforts in the community.
          i) A read-only Clob is changed to a read-write Clob.
          Streams must be updated to take data from the new representation.
          ii) Clob is closed/invalidated by EmbedClob.free() or Connection.commit().
          Next read/write must throw IOException with a Derby SQLState message, i.e.
          XJ215/XJ073. The streams should not return EOF.
          iii) Clob is truncated to a position after the current stream position.
          Nothing happens, stream is updated to reflect the new length/content.
          iv) Clob is truncated to a position before the current stream position.
          Throw IOException with some kind of error message.
          v) Clob is truncated to a position at the current stream position.
          Next read() will return EOF (-1). Next write() will append Clob.
          vi) Clob is updated through EmbedClob.setString().
          Streams are updated with the new content. If the stream is positioned in
          the middle of the updated portion, the user would see some data from the
          old content and some data from the new content. Users not accepting this
          should be able to enforce consistency themselves by controlling access to
          the Clob-object and stream objects.

          In general the synchronization must ensure only one operation on the Clob, both
          through the Clob-object itself and through the streams, can happen at a time.

          Let's say you get two character streams from a Clob and read repeatedly from them, one at a time but in various orders.
          Should both streams retain their own position, or should reading from one stream advance the position in the other stream?
          My take on this is that they retain their own position, but that this access pattern might be very ineffective (reposition from first position every time).

          What do people think about these statements?
          Are they incorrect?
          Do they describe a sensible behavior?
          Since no behavior is dictated by the JDBC spec , I don't think it will be
          wise to implement highly sophisticated concurrency guarantees. Is my proposal
          already crossing this line?
          Should we simply say concurrent access is undefined, or that all access must be
          single threaded? Note that even if it is single threaded, we have to decide what happens when the user mixes calls to Clob and the streams.

          Implementation-wise I think we have a few alternatives, but the current
          implementation of EmbedClob also imply a few limitations. For instance I hoped
          all streams could operate on InternalClob instead of EmbedClob, but that does
          not seem feasible since the InternalClob object in EmbedClob is replaced when
          the Clob goes from read-only to read-write. At least the streams would need a
          reference to both.

          Does anyone have strong opinions on what we should do with this issue?
          This comment is an invitation to get a discussion started...

          Show
          Kristian Waagan added a comment - Here's my first take at defining a synchronization policy for Clob-objects. There are two paths that must be synchronized: a) Use through the EmbedClob object. b) Reads/writes through one of the available streams. For streams, the scenarios below must be handled. My initial propositions for what to do is listed as well. I think they are in agreement with the current efforts in the community. i) A read-only Clob is changed to a read-write Clob. Streams must be updated to take data from the new representation. ii) Clob is closed/invalidated by EmbedClob.free() or Connection.commit(). Next read/write must throw IOException with a Derby SQLState message, i.e. XJ215/XJ073. The streams should not return EOF. iii) Clob is truncated to a position after the current stream position. Nothing happens, stream is updated to reflect the new length/content. iv) Clob is truncated to a position before the current stream position. Throw IOException with some kind of error message. v) Clob is truncated to a position at the current stream position. Next read() will return EOF (-1). Next write() will append Clob. vi) Clob is updated through EmbedClob.setString(). Streams are updated with the new content. If the stream is positioned in the middle of the updated portion, the user would see some data from the old content and some data from the new content. Users not accepting this should be able to enforce consistency themselves by controlling access to the Clob-object and stream objects. In general the synchronization must ensure only one operation on the Clob, both through the Clob-object itself and through the streams, can happen at a time. Let's say you get two character streams from a Clob and read repeatedly from them, one at a time but in various orders. Should both streams retain their own position, or should reading from one stream advance the position in the other stream? My take on this is that they retain their own position, but that this access pattern might be very ineffective (reposition from first position every time). What do people think about these statements? Are they incorrect? Do they describe a sensible behavior? Since no behavior is dictated by the JDBC spec , I don't think it will be wise to implement highly sophisticated concurrency guarantees. Is my proposal already crossing this line? Should we simply say concurrent access is undefined, or that all access must be single threaded? Note that even if it is single threaded, we have to decide what happens when the user mixes calls to Clob and the streams. Implementation-wise I think we have a few alternatives, but the current implementation of EmbedClob also imply a few limitations. For instance I hoped all streams could operate on InternalClob instead of EmbedClob, but that does not seem feasible since the InternalClob object in EmbedClob is replaced when the Clob goes from read-only to read-write. At least the streams would need a reference to both. Does anyone have strong opinions on what we should do with this issue? This comment is an invitation to get a discussion started...
          Hide
          Myrna van Lunteren added a comment -

          Removing 10.3 fixin, I don't think this is a must for 10.3 If it implemented in time fixin can be set when the issue gets marked resolved.

          Show
          Myrna van Lunteren added a comment - Removing 10.3 fixin, I don't think this is a must for 10.3 If it implemented in time fixin can be set when the issue gets marked resolved.

            People

            • Assignee:
              Unassigned
              Reporter:
              Kristian Waagan
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development