Derby
  1. Derby
  2. DERBY-4055

Space may not be reclaimed if row locks are not available after three retries

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.1.1
    • Fix Version/s: None
    • Component/s: Store
    • Urgency:
      Normal
    • Issue & fix info:
      High Value Fix, Workaround attached

      Description

      In a multithreaded clob update where the same row is being updated, space will not be reclaimed. The offending code is in ReclaimSpaceHelper:

      RecordHandle headRecord = work.getHeadRowHandle();

      if (!container_rlock.lockRecordForWrite(
      tran, headRecord, false /* not insert /, false / nowait */))
      {
      // cannot get the row lock, retry
      tran.abort();
      if (work.incrAttempts() < 3)

      { return Serviceable.REQUEUE; }
      else
      {
      // If code gets here, the space will be lost forever, and
      // can only be reclaimed by a full offline compress of the
      // table/index.

      if (SanityManager.DEBUG)
      {
      if (SanityManager.DEBUG_ON(DaemonService.DaemonTrace))
      { SanityManager.DEBUG( DaemonService.DaemonTrace, " gave up after 3 tries to get row lock " + work); }
      }
      return Serviceable.DONE;
      }
      }


      If we cannot get the lock after three tries we give up. The reproduction for this issue is in the test store.ClobReclamationTest.xtestMultiThreadUpdateSingleRow().


      This issue also used to reference the code below and has some references to trying to get a reproduction for that issue, but that work has moved to DERBY-4054. Please see DERBY-4054 for work on the container lock issue.

      ContainerHandle containerHdl =
      openContainerNW(tran, container_rlock, work.getContainerId());

      if (containerHdl == null)
      {
      tran.abort();

      if (SanityManager.DEBUG)
      {
      if (SanityManager.DEBUG_ON(DaemonService.DaemonTrace))
      { SanityManager.DEBUG( DaemonService.DaemonTrace, " aborted " + work + " because container is locked or dropped"); }
      }

      if (work.incrAttempts() < 3) // retry this for serveral times
      { return Serviceable.REQUEUE; }

      else
      {
      // If code gets here, the space will be lost forever, and
      // can only be reclaimed by a full offline compress of the
      // table/index.

      if (SanityManager.DEBUG)
      {
      if (SanityManager.DEBUG_ON(DaemonService.DaemonTrace))

      { SanityManager.DEBUG( DaemonService.DaemonTrace, " gave up after 3 tries to get container lock " + work); }

      }

      return Serviceable.DONE;
      }
      }

        Issue Links

          Activity

          Hide
          Kathey Marsden added a comment -

          With revision 743867, checked in reproduction for the case where it cannot get the row lock after three tries.
          enable store.ClobReclamationTest.xtestMultiThreadUpdateSingleRow()
          If multiple threads are trying to update the same row, reclamation doesn't happen and with derby.debug.true=Daemon trace see in the log many instances of:
          DEBUG DaemonTrace OUTPUT: gave up after 3 tries to get row lock Reclaim COLUMN_CHAIN...

          Show
          Kathey Marsden added a comment - With revision 743867, checked in reproduction for the case where it cannot get the row lock after three tries. enable store.ClobReclamationTest.xtestMultiThreadUpdateSingleRow() If multiple threads are trying to update the same row, reclamation doesn't happen and with derby.debug.true=Daemon trace see in the log many instances of: DEBUG DaemonTrace OUTPUT: gave up after 3 tries to get row lock Reclaim COLUMN_CHAIN...
          Hide
          Kathey Marsden added a comment -

          I noticed in the code coverage output that we do cover the case where we give up after 3 tries to get container lock, so I put in an ASSERT and ran the store tests and found it pops with T_RawStoreFactory.unit. Attached is the derby.log with the assertion. I am not sure how the failing unit tests would translate into a user case. I'll ponder that tomorrow unless it becomes obvious to someone else in the meanwhile.

          Show
          Kathey Marsden added a comment - I noticed in the code coverage output that we do cover the case where we give up after 3 tries to get container lock, so I put in an ASSERT and ran the store tests and found it pops with T_RawStoreFactory.unit. Attached is the derby.log with the assertion. I am not sure how the failing unit tests would translate into a user case. I'll ponder that tomorrow unless it becomes obvious to someone else in the meanwhile.
          Hide
          Kathey Marsden added a comment -

          I looked at one of the test cases that triggered my assertion, P703. In this case the test drops the container before commit.
          if (segment != ContainerHandle.TEMPORARY_SEGMENT)

          { t_util.t_dropContainer(t, segment, cid); // cleanup }

          t.commit();

          So I think that's why we give up. I don't really see how I could emulate this in JDBC and it would be the lock case that I would want to trigger anyway. So, I don't think this unit test gives me a clue how to reproduce the " gave up after 3 tries to get container lock " case with JDBC.

          Show
          Kathey Marsden added a comment - I looked at one of the test cases that triggered my assertion, P703. In this case the test drops the container before commit. if (segment != ContainerHandle.TEMPORARY_SEGMENT) { t_util.t_dropContainer(t, segment, cid); // cleanup } t.commit(); So I think that's why we give up. I don't really see how I could emulate this in JDBC and it would be the lock case that I would want to trigger anyway. So, I don't think this unit test gives me a clue how to reproduce the " gave up after 3 tries to get container lock " case with JDBC.
          Hide
          Kathey Marsden added a comment -

          For the row lock case, changing container_rlock.lockRecordForWrite to wait for the lock seemed to correct the problem, but I don't know what ramifications that might have.

          Always requeuing the request if we couldn't get the lock seemed to work only if I put a sleep in before closing the connection to give the post commit tasks a chance to catch up. I had to sleep over a minute to get 10,000 updates by each of two threads to catch up. Even then I had a lot of free pages, so the space would ultimately get reclaimed, but we'd still use a lot of space. In a very active system I could also see the post commit tasks piling up and becoming a resource issue, so this solution doesn't seem like a good option.

          Questions:

          1) Is it expected that the cleanup tasks will just be aborted if if the connection is closed or is that another bug?

          2) Anyone have any ideas on an acceptable approach to make sure our reclamation gets done without causing unacceptable lock contention?

          Show
          Kathey Marsden added a comment - For the row lock case, changing container_rlock.lockRecordForWrite to wait for the lock seemed to correct the problem, but I don't know what ramifications that might have. Always requeuing the request if we couldn't get the lock seemed to work only if I put a sleep in before closing the connection to give the post commit tasks a chance to catch up. I had to sleep over a minute to get 10,000 updates by each of two threads to catch up. Even then I had a lot of free pages, so the space would ultimately get reclaimed, but we'd still use a lot of space. In a very active system I could also see the post commit tasks piling up and becoming a resource issue, so this solution doesn't seem like a good option. Questions: 1) Is it expected that the cleanup tasks will just be aborted if if the connection is closed or is that another bug? 2) Anyone have any ideas on an acceptable approach to make sure our reclamation gets done without causing unacceptable lock contention?
          Hide
          Mike Matrigali added a comment -

          Wait on the lock or bump on the retry are the only "easy" bug fixes I can
          think of. There are downsides to both.

          wait on lock:
          Currently we only have a single background thread to do these kinds of tasks,
          it is single threaded through each task. So if you wait on lock, then every
          background task waits until that can get finished. Stuff like checkpoints,
          and other space reclamation tasks will wait. The queue may grow and become
          unmanageable.

          bump retry:
          Could lead to just cpu spinning and no actual guarantee of getting the resource
          as it could keep getting unlucky and lose a cycle schedule when the resource
          is actually available.

          Bigger feature projects could do resolve these issues.

          1) Enhance the background daemon schedule utility to support more than one
          thread.
          o I think it is important that we don't just randomly grow the number
          of background threads, as we already have complaints about the one
          background thread per db. But it would be fine to short term allocate
          a thread and then destroy it.
          o not sure exactly what is best, but in this case it would be nice to
          be able to REQUE the post commit task to the daemon saying add it to the
          WAIT allowed queue. At that point maybe the daemon starts a new thread
          for each one or maybe for each N or something else. Maybe it is
          configurable.
          o There are already other problems for which this may be the solution:

          1) As number of cores/cpu's grow it becomes obvious that one background
          thread to N possible concurrent user thread which could each generate
          work is not correct.
          2) There are some tasks that are even more critical than others that could
          benefit from better priority. Things like checkpoint may want their
          own thread rather than share with others.

          2) The problem with space being lost "forever" (which really means until an
          offline compress, in part stems from the row format of the overflow pieces.

          o we could change the overflow pieces to have back pointers to the main
          head page which would make it much easier to figure out a stranded piece
          during online compress.

          o We could write some sort of brute force search which could come up with
          the stranded pieces and call reclaim on them. It is actually not that
          hard of code to write if one doesn't worry about memory. It would go
          something like - probably many ways to optimize it. :

          for (every page at raw store level)
          {
          if (main page)
          {
          for every row on page

          { if (has an overflow piece) add main row handle to main hash table, and overflow handle }

          }

          if (overflow page)
          {
          for every row on page

          { add main row handle, and next overflow handle }

          }

          }

          for (every row in main hash table)

          { delete complete overflow chain from overflow hash table }

          I believe this leaves the disconnected chains in the overflow hash table,
          with some graph algorithm to determine the head of the chains.

          The above works for X lock on the whole table, I am not sure if row locking
          works. If you have to get X lock then an offline compress is not that much
          worse which is why it never got implemented.

          3) This post commit work is all still really at the first implmentation level,
          when most choices were what was simple and mostly work. Not much has
          happened since then. Also in first implementation big CLOB's and BLOB's were
          not even implemented so the downside of losing some stuff was not much.

          As Kathey pointed out I think there are holes in a lot of the post commit
          stuff where a crash can lose them. It seems the only way to make sure we
          don't lose them is to somehow store them transactionally in the db, but at
          what cost? We could implement some sort of internal table, and then store
          rows with the post commit work.

          Personally I think 1 and 2 are better paths than 3. It would be interesting
          to know what other db's do. I am pretter sure postgress for instance rely's
          on scheduling a "scrubber" to do a lot of what we do in post commit. I would
          rather see us make changes that would allow online compress table to handle
          these overflow losses and then enhance the post commit deamon to better handle
          them asap without ever needing the compress table.

          Show
          Mike Matrigali added a comment - Wait on the lock or bump on the retry are the only "easy" bug fixes I can think of. There are downsides to both. wait on lock: Currently we only have a single background thread to do these kinds of tasks, it is single threaded through each task. So if you wait on lock, then every background task waits until that can get finished. Stuff like checkpoints, and other space reclamation tasks will wait. The queue may grow and become unmanageable. bump retry: Could lead to just cpu spinning and no actual guarantee of getting the resource as it could keep getting unlucky and lose a cycle schedule when the resource is actually available. Bigger feature projects could do resolve these issues. 1) Enhance the background daemon schedule utility to support more than one thread. o I think it is important that we don't just randomly grow the number of background threads, as we already have complaints about the one background thread per db. But it would be fine to short term allocate a thread and then destroy it. o not sure exactly what is best, but in this case it would be nice to be able to REQUE the post commit task to the daemon saying add it to the WAIT allowed queue. At that point maybe the daemon starts a new thread for each one or maybe for each N or something else. Maybe it is configurable. o There are already other problems for which this may be the solution: 1) As number of cores/cpu's grow it becomes obvious that one background thread to N possible concurrent user thread which could each generate work is not correct. 2) There are some tasks that are even more critical than others that could benefit from better priority. Things like checkpoint may want their own thread rather than share with others. 2) The problem with space being lost "forever" (which really means until an offline compress, in part stems from the row format of the overflow pieces. o we could change the overflow pieces to have back pointers to the main head page which would make it much easier to figure out a stranded piece during online compress. o We could write some sort of brute force search which could come up with the stranded pieces and call reclaim on them. It is actually not that hard of code to write if one doesn't worry about memory. It would go something like - probably many ways to optimize it. : for (every page at raw store level) { if (main page) { for every row on page { if (has an overflow piece) add main row handle to main hash table, and overflow handle } } if (overflow page) { for every row on page { add main row handle, and next overflow handle } } } for (every row in main hash table) { delete complete overflow chain from overflow hash table } I believe this leaves the disconnected chains in the overflow hash table, with some graph algorithm to determine the head of the chains. The above works for X lock on the whole table, I am not sure if row locking works. If you have to get X lock then an offline compress is not that much worse which is why it never got implemented. 3) This post commit work is all still really at the first implmentation level, when most choices were what was simple and mostly work. Not much has happened since then. Also in first implementation big CLOB's and BLOB's were not even implemented so the downside of losing some stuff was not much. As Kathey pointed out I think there are holes in a lot of the post commit stuff where a crash can lose them. It seems the only way to make sure we don't lose them is to somehow store them transactionally in the db, but at what cost? We could implement some sort of internal table, and then store rows with the post commit work. Personally I think 1 and 2 are better paths than 3. It would be interesting to know what other db's do. I am pretter sure postgress for instance rely's on scheduling a "scrubber" to do a lot of what we do in post commit. I would rather see us make changes that would allow online compress table to handle these overflow losses and then enhance the post commit deamon to better handle them asap without ever needing the compress table.
          Hide
          Kathey Marsden added a comment - - edited

          I was working with a user who found they could work around this issue with use of better indexing. They had a simultaneous update and select which were accessing different rows, but still they saw this problem occur. Putting an index on the column from which they were doing the select avoided a table scan and thus avoided the issue. Hopefully this will work as a workaround for others until we can get this issue fixed.

          Show
          Kathey Marsden added a comment - - edited I was working with a user who found they could work around this issue with use of better indexing. They had a simultaneous update and select which were accessing different rows, but still they saw this problem occur. Putting an index on the column from which they were doing the select avoided a table scan and thus avoided the issue. Hopefully this will work as a workaround for others until we can get this issue fixed.
          Hide
          Knut Anders Hatlen added a comment -

          Triaged for 10.5.2.

          Show
          Knut Anders Hatlen added a comment - Triaged for 10.5.2.
          Hide
          Mike Matrigali added a comment -

          Triaged for 10.9, no changes.

          Show
          Mike Matrigali added a comment - Triaged for 10.9, no changes.

            People

            • Assignee:
              Unassigned
              Reporter:
              Kathey Marsden
            • Votes:
              1 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:

                Development