OpenJPA
  1. OpenJPA
  2. OPENJPA-295

ArrayIndexOutofBoundsException when under load and within a managed Transaction

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.6
    • Fix Version/s: 1.0.1, 1.1.0
    • Component/s: jpa
    • Labels:
      None
    • Environment:
      openjpa running under WebSphere development builds, as well as Geronimo development builds

      Description

      Recent development builds of our WAS products as well as the Geronimo project are seeing exceptions when running under load. An example of the exception is below:

      Caused by:
      java.lang.ArrayIndexOutOfBoundsException
      at java.util.ArrayList.add(ArrayList.java:378)
      at org.apache.openjpa.kernel.AbstractBrokerFactory.syncWithManagedTransaction(AbstractBrokerFactory.java:684)
      ... 39 more

      This is the deepest trace I can get with the actual exception, but the wrappering exception shows this stack trace for geronimo:

      <1.0.0-SNAPSHOT-SNAPSHOT nonfatal general error> org.apache.openjpa.persistence.PersistenceException: null
      at org.apache.openjpa.kernel.AbstractBrokerFactory.syncWithManagedTransaction(AbstractBrokerFactory.java:690)
      at org.apache.openjpa.kernel.BrokerImpl.initialize(BrokerImpl.java:304)
      at org.apache.openjpa.kernel.AbstractBrokerFactory.newBroker(AbstractBrokerFactory.java:182)
      at org.apache.openjpa.kernel.DelegatingBrokerFactory.newBroker(DelegatingBrokerFactory.java:142)
      at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:190)
      at org.apache.openjpa.persistence.EntityManagerFactoryImpl.createEntityManager(EntityManagerFactoryImpl.java:55)
      at org.apache.geronimo.persistence.CMPEntityManagerTxScoped.createEntityManager(CMPEntityManagerTxScoped.java:74)
      at org.apache.geronimo.persistence.CMPEntityManagerTxScoped.getEntityManager(CMPEntityManagerTxScoped.java:55)
      at org.apache.geronimo.persistence.CMPEntityManagerTxScoped.createNamedQuery(CMPEntityManagerTxScoped.java:259)
      at org.apache.geronimo.samples.daytrader.ejb3.TradeSLSBBean.getClosedOrders(TradeSLSBBean.java:335)

      This is happening in two separate products with two different JTA implementations, and also both of these products were working at one point.

      Any ideas?

      1. OPENJPA-295.diff.txt
        13 kB
        Michael Dick
      2. openjpa-295.patch
        11 kB
        Craig L Russell
      3. OPENJPA295.patch
        6 kB
        Marc Prud'hommeaux

        Issue Links

          Activity

          Hide
          Patrick Linskey added a comment -

          We should be creating a concurrent data structure instead of an ArrayList a couple of lines above where you're getting the exception. I can't create a patch easily right now as I've made significant uncommitted changes to that file.

          Show
          Patrick Linskey added a comment - We should be creating a concurrent data structure instead of an ArrayList a couple of lines above where you're getting the exception. I can't create a patch easily right now as I've made significant uncommitted changes to that file.
          Hide
          Marc Prud'hommeaux added a comment -

          Could the application somehow be using the same JTA transaction from different threads? The reason we don't synchronize at AbstractBrokerFactory:684 is that we don't expect to ever be able to have multiple threads accessing the list of brokers, since it is keyed off of the JTA transaction (which is supposed to be thread-bound).

          Show
          Marc Prud'hommeaux added a comment - Could the application somehow be using the same JTA transaction from different threads? The reason we don't synchronize at AbstractBrokerFactory:684 is that we don't expect to ever be able to have multiple threads accessing the list of brokers, since it is keyed off of the JTA transaction (which is supposed to be thread-bound).
          Hide
          Rob Wisniewski added a comment -

          It's a long shot that both of these products would be having the same problem.. we're trying to switch out jpa implementations and see if the problem goes away. earlier builds of the products built on 0.9.7 work fine while the 1.0.0 based builds fail.. trying to retrofit 0.9.7 into the newer builds. Also synchonizing the set locally here to see if that gets rid of the problem.

          Show
          Rob Wisniewski added a comment - It's a long shot that both of these products would be having the same problem.. we're trying to switch out jpa implementations and see if the problem goes away. earlier builds of the products built on 0.9.7 work fine while the 1.0.0 based builds fail.. trying to retrofit 0.9.7 into the newer builds. Also synchonizing the set locally here to see if that gets rid of the problem.
          Hide
          Michael Dick added a comment -

          Attaching a simple patch based on Patrick's comments. The patch uses a ConcurrentSet (arbitrarily chosen) instead of an ArrayList.

          I also sent a jar containing this change to Rob earlier today and it looks like it resolved the problem.

          I'm hesitating to commit the fix until we understand what else changed between 0.9.7 and now. A quick diff shows no changes to the AbstractBrokerFactory.syncWithManagedTransaction method and only a few in AbstractBrokerFactory as a whole.

          I believe the current builds are faster than 0.9.7 (at least for Rob's application) so it's possible that we're just hitting a timing window that we couldn't before.

          Show
          Michael Dick added a comment - Attaching a simple patch based on Patrick's comments. The patch uses a ConcurrentSet (arbitrarily chosen) instead of an ArrayList. I also sent a jar containing this change to Rob earlier today and it looks like it resolved the problem. I'm hesitating to commit the fix until we understand what else changed between 0.9.7 and now. A quick diff shows no changes to the AbstractBrokerFactory.syncWithManagedTransaction method and only a few in AbstractBrokerFactory as a whole. I believe the current builds are faster than 0.9.7 (at least for Rob's application) so it's possible that we're just hitting a timing window that we couldn't before.
          Hide
          Michael Dick added a comment -

          Getting back to Marc's comment. I think the difference in this case is that WebSphere Application Server (and I'm guessing Geronimo) use the TransactionSynchronizationRegistry.

          When we use the TSR to get a TransactionManager it returns a TransactionManagerFacade (a singleton), calling getTransaction() on the facade returns a singleton (itself) as well. If I haven't misread the code, we'll hit this problem fairly quickly with any Application Server that uses the TSR.

          I can think of two potential fixes (assuming I've identified the problem correctly) :

          1. Use a concurrent set (similar to the patch I attached earlier).

          2. The TSR API provides a getTransactionKey() method. The TransactionManagerFacade could be modified to return a new "TransactionFacade" Object when getTransaction is called. The TransactionFacade could use the transactionKey's hashcode (which should be unique to each transaction), and we'll avoid the collision.

          Does anyone see a problem with either approach (or something I've missed) ?

          Show
          Michael Dick added a comment - Getting back to Marc's comment. I think the difference in this case is that WebSphere Application Server (and I'm guessing Geronimo) use the TransactionSynchronizationRegistry. When we use the TSR to get a TransactionManager it returns a TransactionManagerFacade (a singleton), calling getTransaction() on the facade returns a singleton (itself) as well. If I haven't misread the code, we'll hit this problem fairly quickly with any Application Server that uses the TSR. I can think of two potential fixes (assuming I've identified the problem correctly) : 1. Use a concurrent set (similar to the patch I attached earlier). 2. The TSR API provides a getTransactionKey() method. The TransactionManagerFacade could be modified to return a new "TransactionFacade" Object when getTransaction is called. The TransactionFacade could use the transactionKey's hashcode (which should be unique to each transaction), and we'll avoid the collision. Does anyone see a problem with either approach (or something I've missed) ?
          Hide
          Marc Prud'hommeaux added a comment -

          I bet you're right. Good catch!

          Another solution might be to use the following patch: it changes RegistryManagedRuntime.java to use a ThreadLocal to hold its Transaction facades, rather than just having a single global one, which should mean that multiple threads never get the same Transaction facade. I would be interested in hearing if the patch works for you.

          Show
          Marc Prud'hommeaux added a comment - I bet you're right. Good catch! Another solution might be to use the following patch: it changes RegistryManagedRuntime.java to use a ThreadLocal to hold its Transaction facades, rather than just having a single global one, which should mean that multiple threads never get the same Transaction facade. I would be interested in hearing if the patch works for you.
          Hide
          Craig L Russell added a comment -

          Hi Marc,

          There should not be any issue using the identical instance of TSR for the registerSynchronization, get/setRollbackOnly, and getStatus methods. The issue is the assumption that the Transaction instance is different for each transaction.

          I think that Michael Dick's #2 solution is needed. The _transactional is already a concurrent map but all the keys are identical so the basic premise doesn't work.

          More basic, the AbstractBrokerFactory method findTransactionalBroker should have a different implementation for TSR, which has direct support for this functionality via the getResource and putResource methods. I think I'd rather delegate the "findTransactionalBroker" to the ManagedRuntime, which could do the current behavior in other than TSR environments but delegate to the TSR behavior in the app server.

          Back to the current problem how about adding Object getTransactionKey() to ManagedRuntime which in non-TSR implementations returns "this" and in TSR returns "this.getTransactionKey()".

          Show
          Craig L Russell added a comment - Hi Marc, There should not be any issue using the identical instance of TSR for the registerSynchronization, get/setRollbackOnly, and getStatus methods. The issue is the assumption that the Transaction instance is different for each transaction. I think that Michael Dick's #2 solution is needed. The _transactional is already a concurrent map but all the keys are identical so the basic premise doesn't work. More basic, the AbstractBrokerFactory method findTransactionalBroker should have a different implementation for TSR, which has direct support for this functionality via the getResource and putResource methods. I think I'd rather delegate the "findTransactionalBroker" to the ManagedRuntime, which could do the current behavior in other than TSR environments but delegate to the TSR behavior in the app server. Back to the current problem how about adding Object getTransactionKey() to ManagedRuntime which in non-TSR implementations returns "this" and in TSR returns "this.getTransactionKey()".
          Hide
          Marc Prud'hommeaux added a comment -

          My patch doesn't use a different TSR per thread, it just put a different instance of the javax.transaction.Transaction into the ThreadLocal, so that a different Transaction is used per thread (as per the JTA spec).

          That being said, there are plenty of other solutions to this problem. The advantage I see to my solution is that it doesn't introduce additional unnecessary synchronizatin into the findTransactionalBroker() method. However, even if a different solution is desired, my patch should probably be applied anyway, since having the same Transaction instance being used from multiple threads might break other assumptions elsewhere.

          Show
          Marc Prud'hommeaux added a comment - My patch doesn't use a different TSR per thread, it just put a different instance of the javax.transaction.Transaction into the ThreadLocal, so that a different Transaction is used per thread (as per the JTA spec). That being said, there are plenty of other solutions to this problem. The advantage I see to my solution is that it doesn't introduce additional unnecessary synchronizatin into the findTransactionalBroker() method. However, even if a different solution is desired, my patch should probably be applied anyway, since having the same Transaction instance being used from multiple threads might break other assumptions elsewhere.
          Hide
          Craig L Russell added a comment -

          The attached patch adds a new method getTransactionKey to ManagedRuntime and uses this key to manage the Map<Transaction, Broker> map.

          Show
          Craig L Russell added a comment - The attached patch adds a new method getTransactionKey to ManagedRuntime and uses this key to manage the Map<Transaction, Broker> map.
          Hide
          Michael Dick added a comment -

          I agree with Marc and Craig that some variation of solution 2 is the best option (I see Marc's patch as one of these variations).

          That being said I think there are cases where we will need to return a separate TransactionFacade instance if the transaction key is different.

          If the container has suspended that current transaction and started a new one (a bean method with TX_REQUIRES_NEW calls another bean method with TX_REQUIRES_NEW) we'll need a separate key to the _transactions collection. If they used the same key then we'd run into problems the first time an AfterCompletion event is fired.

          Between Marc and Craig's patches I think we're pretty close, I just haven't had a change to play with it much (yet) today.

          Show
          Michael Dick added a comment - I agree with Marc and Craig that some variation of solution 2 is the best option (I see Marc's patch as one of these variations). That being said I think there are cases where we will need to return a separate TransactionFacade instance if the transaction key is different. If the container has suspended that current transaction and started a new one (a bean method with TX_REQUIRES_NEW calls another bean method with TX_REQUIRES_NEW) we'll need a separate key to the _transactions collection. If they used the same key then we'd run into problems the first time an AfterCompletion event is fired. Between Marc and Craig's patches I think we're pretty close, I just haven't had a change to play with it much (yet) today.
          Hide
          Craig L Russell added a comment -

          > My patch doesn't use a different TSR per thread, it just put a different instance of the javax.transaction.Transaction into the ThreadLocal, so that a different Transaction is used per thread (as per the JTA spec).

          Well, we are only using the Transaction interface to do a small number of things, and while it may seem attractive to use a ThreadLocal for this, it's a bit bizarre because the same ThreadLocal is permanently assigned to the thread so the Transaction never changes. This is not per the JTA spec

          > That being said, there are plenty of other solutions to this problem. The advantage I see to my solution is that it doesn't introduce additional unnecessary synchronizatin into the findTransactionalBroker() method. However, even if a different solution is desired, my patch should probably be applied anyway, since having the same Transaction instance being used from multiple threads might break other assumptions elsewhere.

          I think in the case of TSR, there is a much more elegant solution to findTransactionalBroker, by using the getResource and setResource methods. These methods should be much more efficient than our own synchronized _transactional.get(key). Then the only use for _transactional is to make sure that there are no outstanding transactions in progress when we try to close the EntityManagerFactory.

          So I'd like to extend ManagedRuntime with a findTransactionalBroker method that would allow the RegistryManagedRuntime to be more efficient, and put the current AbstractBrokerFactory's implementation into AbstractManagedRuntime.

          Show
          Craig L Russell added a comment - > My patch doesn't use a different TSR per thread, it just put a different instance of the javax.transaction.Transaction into the ThreadLocal, so that a different Transaction is used per thread (as per the JTA spec). Well, we are only using the Transaction interface to do a small number of things, and while it may seem attractive to use a ThreadLocal for this, it's a bit bizarre because the same ThreadLocal is permanently assigned to the thread so the Transaction never changes. This is not per the JTA spec > That being said, there are plenty of other solutions to this problem. The advantage I see to my solution is that it doesn't introduce additional unnecessary synchronizatin into the findTransactionalBroker() method. However, even if a different solution is desired, my patch should probably be applied anyway, since having the same Transaction instance being used from multiple threads might break other assumptions elsewhere. I think in the case of TSR, there is a much more elegant solution to findTransactionalBroker, by using the getResource and setResource methods. These methods should be much more efficient than our own synchronized _transactional.get(key). Then the only use for _transactional is to make sure that there are no outstanding transactions in progress when we try to close the EntityManagerFactory. So I'd like to extend ManagedRuntime with a findTransactionalBroker method that would allow the RegistryManagedRuntime to be more efficient, and put the current AbstractBrokerFactory's implementation into AbstractManagedRuntime.
          Hide
          Patrick Linskey added a comment -

          > So I'd like to extend ManagedRuntime with a findTransactionalBroker method
          > that would allow the RegistryManagedRuntime to be more efficient, and put the
          > current AbstractBrokerFactory's implementation into AbstractManagedRuntime.

          +1

          Show
          Patrick Linskey added a comment - > So I'd like to extend ManagedRuntime with a findTransactionalBroker method > that would allow the RegistryManagedRuntime to be more efficient, and put the > current AbstractBrokerFactory's implementation into AbstractManagedRuntime. +1
          Hide
          Craig L Russell added a comment -

          Michael said:
          >That being said I think there are cases where we will need to return a separate TransactionFacade instance if the transaction key is different.

          TSR was designed so this is not needed. Getting the transaction key is supposed to be a trivial operation so there's no need to remember (cache) it.

          >If the container has suspended that current transaction and started a new one (a bean method with TX_REQUIRES_NEW calls another bean method with TX_REQUIRES_NEW) we'll need a separate key to the _transactions collection. If they used the same key then we'd run into problems the first time an AfterCompletion event is fired.

          As long as we never cache the transaction key, but look it up each time we need it, we're good. That is, any time you need to do anything with the transaction key, get it from the ManagedRuntime. It's guaranteed to give you the current transaction key.

          The usage of the cached key in RemoveTransactionSync is ok because it's called after the transaction which was registered has completed. And the broker you want to remove from _transactional has that key.

          I've attached a new patch that makes the internally cached transaction key an Object instead of a Transaction.

          Show
          Craig L Russell added a comment - Michael said: >That being said I think there are cases where we will need to return a separate TransactionFacade instance if the transaction key is different. TSR was designed so this is not needed. Getting the transaction key is supposed to be a trivial operation so there's no need to remember (cache) it. >If the container has suspended that current transaction and started a new one (a bean method with TX_REQUIRES_NEW calls another bean method with TX_REQUIRES_NEW) we'll need a separate key to the _transactions collection. If they used the same key then we'd run into problems the first time an AfterCompletion event is fired. As long as we never cache the transaction key, but look it up each time we need it, we're good. That is, any time you need to do anything with the transaction key, get it from the ManagedRuntime. It's guaranteed to give you the current transaction key. The usage of the cached key in RemoveTransactionSync is ok because it's called after the transaction which was registered has completed. And the broker you want to remove from _transactional has that key. I've attached a new patch that makes the internally cached transaction key an Object instead of a Transaction.
          Hide
          Michael Dick added a comment -

          I deleted the original bad patch.

          +1 to Craig's approach (now that I see what he meant). AbstractManagedRuntime seems to be missing from the patch. I'm guessing LocalManagedRuntime will also extend AbstractManagedRuntime.

          I'll run a sniff test with WebSphere Application Server in the morning.

          Show
          Michael Dick added a comment - I deleted the original bad patch. +1 to Craig's approach (now that I see what he meant). AbstractManagedRuntime seems to be missing from the patch. I'm guessing LocalManagedRuntime will also extend AbstractManagedRuntime. I'll run a sniff test with WebSphere Application Server in the morning.
          Hide
          Craig L Russell added a comment -

          Sorry, forgot to svn add the new abstract class before svn diff. The new patch includes the new class.

          I still have not put the findBroker method into the ManagedRuntime pending a bit more discussion.

          Show
          Craig L Russell added a comment - Sorry, forgot to svn add the new abstract class before svn diff. The new patch includes the new class. I still have not put the findBroker method into the ManagedRuntime pending a bit more discussion.
          Hide
          Michael Dick added a comment -

          The attached patch is very similar to Craig's last one. I changed the type of
          RegistryManagedRuntime._tm to TransactionManagerRegistryFacade, and made LocalManagedRuntime extend AbstractManagedRuntime.

          Of course there could be better solutions, but this worked for me. A very simple test with WebSphere Application Server proved that we aren't missing anything terribly obvious, but it'd be nice to get validation that this did resolve the issue (I've never reproduced the problem).

          Show
          Michael Dick added a comment - The attached patch is very similar to Craig's last one. I changed the type of RegistryManagedRuntime._tm to TransactionManagerRegistryFacade, and made LocalManagedRuntime extend AbstractManagedRuntime. Of course there could be better solutions, but this worked for me. A very simple test with WebSphere Application Server proved that we aren't missing anything terribly obvious, but it'd be nice to get validation that this did resolve the issue (I've never reproduced the problem).
          Hide
          Michael Dick added a comment -

          Previous patch had a bug in AutomaticManagedRuntime - it wasn't delegating to the real ManagedRuntime's getTransactionKey method.

          Show
          Michael Dick added a comment - Previous patch had a bug in AutomaticManagedRuntime - it wasn't delegating to the real ManagedRuntime's getTransactionKey method.
          Hide
          Michael Dick added a comment -

          I sent the OPENJPA-295.diff.txt patch to Rob and he confirmed that it does resolve the problem. I'll commit those changes but leave the JIRA open in case we want to discus the findBroker change here too.

          Show
          Michael Dick added a comment - I sent the OPENJPA-295 .diff.txt patch to Rob and he confirmed that it does resolve the problem. I'll commit those changes but leave the JIRA open in case we want to discus the findBroker change here too.
          Hide
          Craig L Russell added a comment -

          Looking at how to use the TSR map of object->object instead of the _transactional, we would need to make a change to the AbstractBrokerFactory method BrokerImpl findTransactionalBroker(String user, String pass). The change would delegate to the ManagedRuntime which might have a better way to look up the Broker in the context of the current transaction.

          The _transactional map is a map of TransactionKey to Broker. This is needed for a completely different purpose (keeping track of whether there are any Brokers with open transactions).

          The issue is separation of concerns. Currently the ManagedRuntime doesn't really know anything about Brokers or transaction maps. To implement the findTransactionalBroker entirely in ManagedRuntime would introduce a lot of broker-aware code. We could simply have ManagedRuntime know about a Map of Transaction to Object, and do the rest of the processing in AbstractBrokerFactory. This would separate functionality but not completely.

          With this separation, the ManagedRuntime would have a new method Object getByTransactionKey(Object brokerFactory, Map transactional) that returns the entry associated with the current transaction (which it knows how to get) in the Map parameter (Abstract implementation) or the entry associated with the TSR Map using the brokerFactory as the key. This is not completely abstract but pretty close.

          But if we do all that, we should probably look at delegating most of the syncWithManagedTransaction to the ManagedRuntime as well...

          Show
          Craig L Russell added a comment - Looking at how to use the TSR map of object->object instead of the _transactional, we would need to make a change to the AbstractBrokerFactory method BrokerImpl findTransactionalBroker(String user, String pass). The change would delegate to the ManagedRuntime which might have a better way to look up the Broker in the context of the current transaction. The _transactional map is a map of TransactionKey to Broker. This is needed for a completely different purpose (keeping track of whether there are any Brokers with open transactions). The issue is separation of concerns. Currently the ManagedRuntime doesn't really know anything about Brokers or transaction maps. To implement the findTransactionalBroker entirely in ManagedRuntime would introduce a lot of broker-aware code. We could simply have ManagedRuntime know about a Map of Transaction to Object, and do the rest of the processing in AbstractBrokerFactory. This would separate functionality but not completely. With this separation, the ManagedRuntime would have a new method Object getByTransactionKey(Object brokerFactory, Map transactional) that returns the entry associated with the current transaction (which it knows how to get) in the Map parameter (Abstract implementation) or the entry associated with the TSR Map using the brokerFactory as the key. This is not completely abstract but pretty close. But if we do all that, we should probably look at delegating most of the syncWithManagedTransaction to the ManagedRuntime as well...
          Hide
          Patrick Linskey added a comment -

          It looks like the main feature is resolved here, and the issue is open for follow-on work. I'm going to remove the 1.0.0 designation and downgrade the issue to "Major" from "Blocking".

          Show
          Patrick Linskey added a comment - It looks like the main feature is resolved here, and the issue is open for follow-on work. I'm going to remove the 1.0.0 designation and downgrade the issue to "Major" from "Blocking".
          Hide
          Craig L Russell added a comment -

          This issue is resolved. See OPENJPA-310 for more discussion on possible refactoring of the implementation.

          Show
          Craig L Russell added a comment - This issue is resolved. See OPENJPA-310 for more discussion on possible refactoring of the implementation.
          Hide
          Michael Dick added a comment -

          Already fixed in 1.0.1.

          Show
          Michael Dick added a comment - Already fixed in 1.0.1.

            People

            • Assignee:
              Michael Dick
              Reporter:
              Rob Wisniewski
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development