Apache Roller
  1. Apache Roller
  2. ROL-1761

tasks will not run more often than leaseTime minutes

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 5.0
    • Labels:
      None
    • Environment:
      solaris 10, oracle 10g, glassfishv2
    • Database:
      oracle 10g
    • JDK Version:
      1.6
    • O/S Version:
      solaris 10

      Description

      We're finding that our tasks won't run more often than every leaseTime minutes.

      After a bit more digging, we found our JPAThreadManagerImpl.unregisterLease() method is not updating the database,
      so any lease that is successfully taken out are kept , blocking that task from running, until they expire.

      In this case , the JPA-generated SQL looks ok, but the roller logs show what looks like a transaction rolling back:

      DEBUG 2008-11-21 11:47:00,097 CommonsLogFactory$LogAdapter:trace - Executing query: [UPDATE TaskLock t SET t.timeLeased=?1 WHERE t.
      name=?2 AND t.clientId=?3] with parameters:

      {3=ScheduledEntriesTask, 2=0, 1=devel-roller01}

      DEBUG 2008-11-21 11:47:00,097 CommonsLogFactory$LogAdapter:trace - <t 6810846, conn 4959010> executing prepstmnt 20488940 UPDATE ro
      ller_tasklock t0 SET timeleased = ? WHERE (t0.name = ? AND t0.client = ?) [params=(int) 0, (String) ScheduledEntriesTask, (String)
      devel-roller01]
      DEBUG 2008-11-21 11:47:00,098 CommonsLogFactory$LogAdapter:trace - <t 6810846, conn 4959010> [0 ms] spent
      DEBUG 2008-11-21 11:47:00,098 RollerTaskWithLeasing:run - ScheduledEntriesTask: Lease released, task finished
      DEBUG 2008-11-21 11:47:00,109 CommonsLogFactory$LogAdapter:trace - <t 6810846, conn 4959010> [10 ms] rollback *******
      DEBUG 2008-11-21 11:47:00,109 CommonsLogFactory$LogAdapter:trace - <t 6810846, conn 4959010> [0 ms] close

      Is the line flagged '******' a transaction rolling back?
      If so, does anyone have any clue where I start looking to find out why JPA did that?
      I suppose I could work around this by setting interval == leaseTime, but this is a multi-server cluster
      and I've got enough headaches without adding race conditions into the mix

      1. ROL-1761.patch
        3 kB
        Anil Gangolli

        Activity

        Hide
        Glen Mazza added a comment -

        Closed issues related to Roller 5.0 release.

        Show
        Glen Mazza added a comment - Closed issues related to Roller 5.0 release.
        Hide
        Dick Davies added a comment -

        My workaround for the time being is to add this to roller-custom.properties:

        1. drop leasetime to 5 minutes to work around
        2. https://issues.apache.org/roller/browse/ROL-1762
          tasks.ScheduledEntriesTask.leaseTime=5
          tasks.ResetHitCountsTask.leaseTime=5
          tasks.TurnoverReferersTask.leaseTime=5
          tasks.PingQueueTask.leaseTime=5
          tasks.SyncWebsitesTask.leaseTime=5
          tasks.RefreshRollerPlanetTask.leaseTime=5
          tasks.TechnoratiRankingsTask.leaseTime=5

        it's not perfect , but I think it should mitigate the problem until 4.1 ships.

        Show
        Dick Davies added a comment - My workaround for the time being is to add this to roller-custom.properties: drop leasetime to 5 minutes to work around https://issues.apache.org/roller/browse/ROL-1762 tasks.ScheduledEntriesTask.leaseTime=5 tasks.ResetHitCountsTask.leaseTime=5 tasks.TurnoverReferersTask.leaseTime=5 tasks.PingQueueTask.leaseTime=5 tasks.SyncWebsitesTask.leaseTime=5 tasks.RefreshRollerPlanetTask.leaseTime=5 tasks.TechnoratiRankingsTask.leaseTime=5 it's not perfect , but I think it should mitigate the problem until 4.1 ships.
        Hide
        Anil Gangolli added a comment -

        This should be fixed in current trunk with the following change

        http://svn.apache.org/viewvc?view=rev&revision=724008

        The change should make 4.1-dev.

        The commit may expose ROL-1760 in unit testing on Oracle and Postgres

        Show
        Anil Gangolli added a comment - This should be fixed in current trunk with the following change http://svn.apache.org/viewvc?view=rev&revision=724008 The change should make 4.1-dev. The commit may expose ROL-1760 in unit testing on Oracle and Postgres
        Hide
        Anil Gangolli added a comment -


        Suggested patch for current trunk (based on revision 703175).

        Show
        Anil Gangolli added a comment - Suggested patch for current trunk (based on revision 703175).
        Hide
        Anil Gangolli added a comment -

        I examined this issue only on current trunk (4.1-dev).

        I found that the unit test TaskLockTest.testTaskLockCRUD() is doing its own flushing so it doesn't reveal this problem. Also, I noticed that with current semantics/implementation of JPAThreadManagerImpl, the unregisterLease() should be idemptotent, making the test somewhat flawed.

        After removing the test-level flushes, correcting the duplicate release test assertion and adding flushes at the JPAThreadManagerImpl level, the test is passing as I expect.

        I will be attaching a patch file for current trunk. Seeking review before checkin.

        Show
        Anil Gangolli added a comment - I examined this issue only on current trunk (4.1-dev). I found that the unit test TaskLockTest.testTaskLockCRUD() is doing its own flushing so it doesn't reveal this problem. Also, I noticed that with current semantics/implementation of JPAThreadManagerImpl, the unregisterLease() should be idemptotent, making the test somewhat flawed. After removing the test-level flushes, correcting the duplicate release test assertion and adding flushes at the JPAThreadManagerImpl level, the test is passing as I expect. I will be attaching a patch file for current trunk. Seeking review before checkin.
        Hide
        Ron Peterson added a comment -

        Based on Anil's comments, I tried making the following changes to the JPA thread manager implementation. I just copied these flush statements from their equivalent locations in the Hibernate thread manager code (HibernateThreadManagerImpl.java). After this change, the UPDATE roller_tasklock statements which were previously being rolled back are now committed (based on watching debug output, as well as watching changes to the database table itself).

        2226$ diff JPAThreadManagerImpl.java.original JPAThreadManagerImpl.java
        121a122,123
        > roller.flush();
        >
        163a166,168
        > // this may not be needed
        > roller.flush();

        Show
        Ron Peterson added a comment - Based on Anil's comments, I tried making the following changes to the JPA thread manager implementation. I just copied these flush statements from their equivalent locations in the Hibernate thread manager code (HibernateThreadManagerImpl.java). After this change, the UPDATE roller_tasklock statements which were previously being rolled back are now committed (based on watching debug output, as well as watching changes to the database table itself). 2226$ diff JPAThreadManagerImpl.java.original JPAThreadManagerImpl.java 121a122,123 > roller.flush(); > 163a166,168 > // this may not be needed > roller.flush();
        Hide
        Anil Gangolli added a comment -

        Adding some comments extracted from my e-mail responses on the user list. Corrected the spelling of Allen's name.

        I think the second comment is the root cause and the patch is basically to add the flush() in the registerLease and unregisterLease calls in the (somewhat oddly named) ThreadManager, which actually manages the TaskLock lease entities. Hoping for verification from Allen before proceeding. Watching this, but leaving unassigned.

        • We should look for a missing commit there. I think the connection pool does a rollback each time the connection is released back to the pool to safeguard against uncommitted work being committed by the next caller claiming the connection. My guess is we're seeing that.
        • Looking at RollerTaskWithLeasing.java it looks like what is happening is it is explicitly calling WebloggerFactory.getWeblogger().release() and this may be causing the connection release before the update gets committed.
        • I am not seeing any flush() on the persistence strategy in this code path at all, which (assuming READ_COMMITTED transaction isolation) suggests that there may also not be proper exclusion across multiple threads/hosts. A flush() would be needed on registering the lease as well as releasing the lease. I think in this case the manager (ThreadManager) actually should be doing this. Allen, if you're around can you verify ?
        Show
        Anil Gangolli added a comment - Adding some comments extracted from my e-mail responses on the user list. Corrected the spelling of Allen's name. I think the second comment is the root cause and the patch is basically to add the flush() in the registerLease and unregisterLease calls in the (somewhat oddly named) ThreadManager, which actually manages the TaskLock lease entities. Hoping for verification from Allen before proceeding. Watching this, but leaving unassigned. We should look for a missing commit there. I think the connection pool does a rollback each time the connection is released back to the pool to safeguard against uncommitted work being committed by the next caller claiming the connection. My guess is we're seeing that. Looking at RollerTaskWithLeasing.java it looks like what is happening is it is explicitly calling WebloggerFactory.getWeblogger().release() and this may be causing the connection release before the update gets committed. I am not seeing any flush() on the persistence strategy in this code path at all, which (assuming READ_COMMITTED transaction isolation) suggests that there may also not be proper exclusion across multiple threads/hosts. A flush() would be needed on registering the lease as well as releasing the lease. I think in this case the manager (ThreadManager) actually should be doing this. Allen, if you're around can you verify ?

          People

          • Assignee:
            Anil Gangolli
            Reporter:
            Dick Davies
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development