Uploaded image for project: 'Derby'
  1. Derby
  2. DERBY-4137

OOM issue using XA with timeouts

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Normal
    • Known fix, Workaround attached
    • Crash

    Description

      When using JTA for transaction control and a transaction timeout is set,
      EmbedXAResource ends up calling XATransactionState.scheduleTimeoutTask() which
      in turn registers a timeoutTask with java.util.Timer. In the normal case where
      the transaction finishes before the timeout, XATransactionState.xa_finalize()
      then calls timeoutTask.cancel(). So far this so good. The problem, however, is
      that java.util.TimerTask.cancel() does not actually remove the task from the
      timer queue, meaning that a strong reference to the timeoutTask is kept (and
      through that to XATransactionState, the EmbedConnection, etc). The reference
      is not removed until the time at which the timeout would have fired, which can
      be a long time. Under load this can quickly lead to an OOM situation.

      A simple fix is to call Timer.purge() every so often. While the javadocs talk
      about purge() being rarely needed and that it's not extremely cheap, I've
      found that calling it after every cancel() is the best approach, for several
      reasons: 1) the scenario here is that almost all tasks are cancelled, and
      hence this somewhat fits the Timer.purge() description of an "application that
      cancels a large number of tasks"; 2) there usually isn't a very large number
      of simultaneous transactions, and hence purge() is actually quite cheap; 3)
      this ensures the strong reference is immediately removed, allowing the GC to
      do a better job. Interestingly enough, I've had this exact same issue on a
      different type of db, and I had tested the purge() there and found it to be in
      the sub-microsecond range for 100 transactions (or similar - I don't recall
      the exact data), i.e. completely negligible.

      In short, my suggestion is to change xa_finalize as follows:

      synchronized void xa_finalize() {
      if (timeoutTask != null)

      { timeoutTask.cancel(); Monitor.getMonitor().getTimerFactory(). getCancellationTimer().purge(); }

      isFinished = true;
      }

      As a temporary workaround, applications can do this themselves, i.e.
      add something like the following whenever they close a Connection:

      import org.apache.derby.iapi.services.monitor.Monitor;
      Monitor.getMonitor().getTimerFactory().getCancellationTimer().purge();

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kristwaa Kristian Waagan
            ronald@innovation.ch Ronald Tschalaer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment