OpenJPA
  1. OpenJPA
  2. OPENJPA-453

Evicting embedded object nullifies statemanager

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Kodo 4.1.4, Ms sql server 2005, jTDS 1.2, jdk 1.6

      Description

      I am noticing the following behaviour: If evict() is called on an embedded
      object the statemanager is nullified which is in contrast to non-embedded
      objects. Subsequently, calling JDOHelper.getPersistenceManager() on the
      evicted embedded object returns null. Is this the correct behaviour?

      1. JIRA-453_trunk.patch
        5 kB
        Ravi P Palacherla
      2. OPENJPA-453_trunk_testcase.patch
        4 kB
        Ravi P Palacherla
      3. OpenJPABug453Embedded.zip
        8 kB
        David Ezzio
      4. openJPATestCase.zip
        4 kB
        Ravi P Palacherla
      5. TestCaseEvictEmbedded.zip
        2 kB
        Christiaan

        Activity

        Hide
        Christiaan added a comment -

        TestCase to reproduce described scenario

        Show
        Christiaan added a comment - TestCase to reproduce described scenario
        Hide
        Michael Dick added a comment -

        Per comments on the users mailing list this issue can be reproduced with 'vanilla' OpenJPA (no JDO / Kodo). Ravi Palacherla provided this information on the list :

        Hi Mike,

        I am able to replicate the nullpointer exception with openJPA

        Exception in thread "Thread-1" java.lang.NullPointerException
        at org.apache.openjpa.kernel.StateManagerImpl.replaceObjectField(StateManagerImpl.java:2090)
        at Book.pcReplaceField(Book.java)
        at org.apache.openjpa.kernel.StateManagerImpl.replaceField(StateManagerImpl.java:3056)
        at org.apache.openjpa.kernel.StateManagerImpl.clearFields(StateManagerImpl.java:2665)
        at org.apache.openjpa.kernel.PNonTransState.beforeWrite(PNonTransState.java:107)
        at org.apache.openjpa.kernel.PNonTransState.beforeWrite(PNonTransState.java:77)
        at org.apache.openjpa.kernel.StateManagerImpl.dirty(StateManagerImpl.java:1608)
        at org.apache.openjpa.kernel.StateManagerImpl.settingStringField(StateManagerImpl.java:1976)
        at Book.pcSettitle(Book.java)
        at Book.setTitle(Book.java:23)
        at Main$2.run(Main.java:60)
        at java.lang.Thread.run(Thread.java:595)

        You can download the test case from:
        http://n2.nabble.com/file/n2982705/embeddopenJPA.zip embeddopenJPA.zip

        If you verify I set the property "<property name="openjpa.Multithreaded"
        value="false"/>" even though I am spawning threads in my sample.

        When I correct this configuration by changing MultiThreaded property to
        "true" then my test case hangs.

        Please let me know if you need more information.

        Udi,
        Please validate my test case to see if it is in sync with your application.

        • Ravi.
        Show
        Michael Dick added a comment - Per comments on the users mailing list this issue can be reproduced with 'vanilla' OpenJPA (no JDO / Kodo). Ravi Palacherla provided this information on the list : Hi Mike, I am able to replicate the nullpointer exception with openJPA Exception in thread "Thread-1" java.lang.NullPointerException at org.apache.openjpa.kernel.StateManagerImpl.replaceObjectField(StateManagerImpl.java:2090) at Book.pcReplaceField(Book.java) at org.apache.openjpa.kernel.StateManagerImpl.replaceField(StateManagerImpl.java:3056) at org.apache.openjpa.kernel.StateManagerImpl.clearFields(StateManagerImpl.java:2665) at org.apache.openjpa.kernel.PNonTransState.beforeWrite(PNonTransState.java:107) at org.apache.openjpa.kernel.PNonTransState.beforeWrite(PNonTransState.java:77) at org.apache.openjpa.kernel.StateManagerImpl.dirty(StateManagerImpl.java:1608) at org.apache.openjpa.kernel.StateManagerImpl.settingStringField(StateManagerImpl.java:1976) at Book.pcSettitle(Book.java) at Book.setTitle(Book.java:23) at Main$2.run(Main.java:60) at java.lang.Thread.run(Thread.java:595) You can download the test case from: http://n2.nabble.com/file/n2982705/embeddopenJPA.zip embeddopenJPA.zip If you verify I set the property "<property name="openjpa.Multithreaded" value="false"/>" even though I am spawning threads in my sample. When I correct this configuration by changing MultiThreaded property to "true" then my test case hangs. Please let me know if you need more information. Udi, Please validate my test case to see if it is in sync with your application. Ravi.
        Hide
        Ravi P Palacherla added a comment -

        Attached is an openJPA test case for this issue.
        ReadMe.txt has steps to execute the testcase.

        Thanks,
        Ravi.

        Show
        Ravi P Palacherla added a comment - Attached is an openJPA test case for this issue. ReadMe.txt has steps to execute the testcase. Thanks, Ravi.
        Hide
        Craig L Russell added a comment -

        It's probably useful to review the APIs and what they are intended for.

        Evict is intended to clear the persistent values of an instance so that they can be cleanly garbage collected. So all the persistent values are reset to their Java default values. This reduces the number of references to other instances. But the internal reference to the persistence context remains in the evicted instance. If the application wants to refresh the instances, it can.

        Embedded instances have no identity so they cannot be used without their owners. When evicting owners, the fields that reference the embedded instances are cleared so the embedded instances are left floating. If the owner is subsequently refreshed, new embedded instances are created from datastore values and attached to the owner.

        Embedded instances behave mostly like transient instances with the small additional behavior that they track changes made to themselves while owned.

        The JDO spec doesn't discuss the behavior of embedded instances with regard to evict or isXXX methods. It doesn't discuss whether the instances themselves are persistent although they have no persistent identity.

        The attached test case doesn't demonstrate the NullPointerException, does it? If you can demonstrate NPE, we can fix that.

        There's a discussion on the jdo-dev@db.apache.org mailing list that might be of interest to folks here. The proposal is to treat embedded objects as transient for the purposes of life cycle interrogatives and APIs. So isPersistent(embedded_instance) would return false just like a transient instance.

        Please see http://markmail.org/search/?q=jdo%20evict%20embedded

        Show
        Craig L Russell added a comment - It's probably useful to review the APIs and what they are intended for. Evict is intended to clear the persistent values of an instance so that they can be cleanly garbage collected. So all the persistent values are reset to their Java default values. This reduces the number of references to other instances. But the internal reference to the persistence context remains in the evicted instance. If the application wants to refresh the instances, it can. Embedded instances have no identity so they cannot be used without their owners. When evicting owners, the fields that reference the embedded instances are cleared so the embedded instances are left floating. If the owner is subsequently refreshed, new embedded instances are created from datastore values and attached to the owner. Embedded instances behave mostly like transient instances with the small additional behavior that they track changes made to themselves while owned. The JDO spec doesn't discuss the behavior of embedded instances with regard to evict or isXXX methods. It doesn't discuss whether the instances themselves are persistent although they have no persistent identity. The attached test case doesn't demonstrate the NullPointerException, does it? If you can demonstrate NPE, we can fix that. There's a discussion on the jdo-dev@db.apache.org mailing list that might be of interest to folks here. The proposal is to treat embedded objects as transient for the purposes of life cycle interrogatives and APIs. So isPersistent(embedded_instance) would return false just like a transient instance. Please see http://markmail.org/search/?q=jdo%20evict%20embedded
        Hide
        Christiaan added a comment -

        I think two issues combined in this issue:
        1) Whether the sm should be nullified on evict or commit or not. This has some implications and is under discussion as Craig mentions for jdo.

        2) A NullpointerException is thrown in OpenJPA code. Whatever the decision for item 1 I don't think this nullpointer should be thrown. It occurs for the following scenario where one thread is accessing an embedded object (object is instantiated and the fields need to be loaded from the datastore and another thread is doing a commit and wants to nullify the statemanager of the embedded object:
        Thread1 (Accessing an embedded object)
        ==========
        getEmbeddedObject() => load fields from datastore:
        org.apache.openjpa.kernel.StateManagerImpl.replaceField(StateManagerImpl.java:2872)
        stop here

        Thread2
        =======
        perform a commit => sm is nullified
        now continue with Thread1

        So instantiating and loading the fields of the embedded should probably be one atomic action. (or fields should be left null if the statemanager is already nullified).

        Ravi, I think you have attached the wrong testcase. The link in your comment is right testcase.

        kind regards,
        Christiaan

        Show
        Christiaan added a comment - I think two issues combined in this issue: 1) Whether the sm should be nullified on evict or commit or not. This has some implications and is under discussion as Craig mentions for jdo. 2) A NullpointerException is thrown in OpenJPA code. Whatever the decision for item 1 I don't think this nullpointer should be thrown. It occurs for the following scenario where one thread is accessing an embedded object (object is instantiated and the fields need to be loaded from the datastore and another thread is doing a commit and wants to nullify the statemanager of the embedded object: Thread1 (Accessing an embedded object) ========== getEmbeddedObject() => load fields from datastore: org.apache.openjpa.kernel.StateManagerImpl.replaceField(StateManagerImpl.java:2872) stop here Thread2 ======= perform a commit => sm is nullified now continue with Thread1 So instantiating and loading the fields of the embedded should probably be one atomic action. (or fields should be left null if the statemanager is already nullified). Ravi, I think you have attached the wrong testcase. The link in your comment is right testcase. kind regards, Christiaan
        Hide
        Udi Ben-Amitai added a comment -

        Hey Michael,
        I tried what you suggested, but my multithreaded property was already set to 'true', and it keeps happening...

        Show
        Udi Ben-Amitai added a comment - Hey Michael, I tried what you suggested, but my multithreaded property was already set to 'true', and it keeps happening...
        Hide
        David Ezzio added a comment -

        With Ravi's permission, I have taken his test case and
        modified it somewhat and uploaded it here.

        My interest in the test case contained in
        OpenJPABug453Embedded.zip concerned its behavior with regard
        to the OpenJPA version that formed the basis for Kodo JDO
        4.1.4, for which I have a related test case. The specific
        version of OpenJPA of interest to me was rev547073, a
        1.0.0-Snapshot.

        My goal was to determine whether the behavior was the same for
        both Kodo JDO and the version of OpenJPA that Kodo uses. For
        the same configuration (namely lazy-fetch of the Page.name
        field, the clearing of fields at the end of a transaction, and
        optimistic transactions), the behavior is essentially the
        same. In both cases, a NPE in enhancement added code results
        from a race condition.

        The race condition arises from the repeated use of the
        following logic in enhancement added code (expressed as pseudo
        code here):

        [PC object class].pc[SomeMethod]
        {
        if (pcStateManager == null)
        ... do something by default with current values in
        this object
        else

        { ... use pcStateManager, likely resulting in an eventual lock, and possibly in a call back to this object's PersistenceCapable interface methods ... }

        }

        The race condition arises because the check of the value of
        pcStateManager is obtained and used twice in succession
        without first locking the state manager. Another thread using
        and having locked the state manager has the time to alter the
        state of this pc object, including nulling out its
        pcStateManager reference.

        As a result, a NPE can occur in the else clause above, or in
        the callback from the state manager to other enhancement added
        methods where the assumption is that pcStateManager reference
        is not null.

        The behavior for OpenJPA 2.0 and trunk is somewhat different,
        and Ravi is looking into this configuration.

        For the specific problem encountered by this test case in the
        configuration that I tested, the work around is to eagerly
        load the Page.name field. This is the default configuration
        for JPA (as well as JDO.)

        The test case represents a problematic use case. Before a fix
        should be attempted, I believe it is important to come up with
        one or more test cases that represent real-world use cases.

        The test case has the following design. It consists of three
        threads. The main thread sets up a reference to a FCO
        persistent Book object and its embedded (SCO) Page object. The
        book has a title field and the page has a name field. The
        domain is reasonable and appropriately simple for a test case.
        The main thread sets up two runnable threads. The transacting
        thread cycles transactions as quickly as possible continually
        changing the book's title and committing. This is a reasonable
        stress condition. The read-only thread uses the self-same book
        object to continually get the book's page and read the page's
        name. The configuration is for optimistic transactions, with
        the clearing of transactional objects' transactional state at
        the end each transaction (RetainState == false), and the
        allowing of non-transactional reads. Multithreading is set to
        true.

        The general use case and application design behind this test,
        if I understand it correctly, goes something like this. We
        have a set of domain objects that we want to be in memory most
        of the time. Our application mostly reads, but also writes.
        We'll have a variety of reader threads, and one writer thread.
        By using the above configuration, – so goes the thinking –
        we can have fast reads, a persistence context (cache) that
        heals itself, since anything unloaded at the end of a
        transaction, or not yet loaded, will be loaded during the next
        read, and a minimal of object creation and garbage collection.
        As a result, the app will run as fast as available CPU,
        memory, and the locking inherent in the persistence layer will
        permit.

        So goes the Sirens song, but there are dangers. One such
        danger relates to second class objects (SCOs) and the lack of
        locking by the application on the domain objects. Basically,
        any reference to an SCO that we obtain in step 1 of an
        application method may become unowned before we get to step 2.
        Once it becomes unowned, the persistence layer no longer
        supports its access to the values in the database.
        Consequently, we don't know whether the null we get from
        reading page.name results from the page not having a name or
        the page becoming unowned with an unloaded name. The random
        nature of the outcomes – even were OpenJPA working perfectly
        – means that the application must use some locking, such as a
        reader-writer lock, to ensure predictable outcomes. With such
        application locking, the race conditions in OpenJPA may become
        irrelevant.

        But a deeper danger lurks. In OpenJPA (at the rev that I
        examined), the StateManagerImpl's lock delegates to the
        BrokerImpl lock. As a consequence, although there is a
        separate state manager associated with each persistent object
        (FCO) in the persistence context, there is only one lock for
        all them to share. Consequently, once a state manager obtains
        a lock, all threads are locked out of any other state managers
        (for the same persistence context.) Consequently, the hoped
        for scalability of the read-only threads cannot be achieved
        with the present locking architecture.

        For this reason, the work to fix the race condition is wasted
        unless a better locking architecture can be implemented that
        will permit the hoped for scaling of the use case. Even with
        this work done, the application design described above must be
        improved with a read-write locking strategy to prevent race
        conditions in the application and to insure that memory
        barriers are crossed. None of this work is easy, and it has
        been argued that there are easier ways to accomplish the
        requirements of the application.

        In my opinion, it would be worthwhile for us to decide whether
        we support a multithreaded domain model, and if so, what the
        benefits would be, and what the application programming model
        should be. Clarity on this issue will benefit our users.

        As I mentioned earlier, my results are based on an old verson
        of OpenJPA. More recent versions appears to show different
        (and bad) behavior. Ravi is looking into this configuration
        and may have something to say on the subject.

        Show
        David Ezzio added a comment - With Ravi's permission, I have taken his test case and modified it somewhat and uploaded it here. My interest in the test case contained in OpenJPABug453Embedded.zip concerned its behavior with regard to the OpenJPA version that formed the basis for Kodo JDO 4.1.4, for which I have a related test case. The specific version of OpenJPA of interest to me was rev547073, a 1.0.0-Snapshot. My goal was to determine whether the behavior was the same for both Kodo JDO and the version of OpenJPA that Kodo uses. For the same configuration (namely lazy-fetch of the Page.name field, the clearing of fields at the end of a transaction, and optimistic transactions), the behavior is essentially the same. In both cases, a NPE in enhancement added code results from a race condition. The race condition arises from the repeated use of the following logic in enhancement added code (expressed as pseudo code here): [PC object class] .pc [SomeMethod] { if (pcStateManager == null) ... do something by default with current values in this object else { ... use pcStateManager, likely resulting in an eventual lock, and possibly in a call back to this object's PersistenceCapable interface methods ... } } The race condition arises because the check of the value of pcStateManager is obtained and used twice in succession without first locking the state manager. Another thread using and having locked the state manager has the time to alter the state of this pc object, including nulling out its pcStateManager reference. As a result, a NPE can occur in the else clause above, or in the callback from the state manager to other enhancement added methods where the assumption is that pcStateManager reference is not null. The behavior for OpenJPA 2.0 and trunk is somewhat different, and Ravi is looking into this configuration. For the specific problem encountered by this test case in the configuration that I tested, the work around is to eagerly load the Page.name field. This is the default configuration for JPA (as well as JDO.) The test case represents a problematic use case. Before a fix should be attempted, I believe it is important to come up with one or more test cases that represent real-world use cases. The test case has the following design. It consists of three threads. The main thread sets up a reference to a FCO persistent Book object and its embedded (SCO) Page object. The book has a title field and the page has a name field. The domain is reasonable and appropriately simple for a test case. The main thread sets up two runnable threads. The transacting thread cycles transactions as quickly as possible continually changing the book's title and committing. This is a reasonable stress condition. The read-only thread uses the self-same book object to continually get the book's page and read the page's name. The configuration is for optimistic transactions, with the clearing of transactional objects' transactional state at the end each transaction (RetainState == false), and the allowing of non-transactional reads. Multithreading is set to true. The general use case and application design behind this test, if I understand it correctly, goes something like this. We have a set of domain objects that we want to be in memory most of the time. Our application mostly reads, but also writes. We'll have a variety of reader threads, and one writer thread. By using the above configuration, – so goes the thinking – we can have fast reads, a persistence context (cache) that heals itself, since anything unloaded at the end of a transaction, or not yet loaded, will be loaded during the next read, and a minimal of object creation and garbage collection. As a result, the app will run as fast as available CPU, memory, and the locking inherent in the persistence layer will permit. So goes the Sirens song, but there are dangers. One such danger relates to second class objects (SCOs) and the lack of locking by the application on the domain objects. Basically, any reference to an SCO that we obtain in step 1 of an application method may become unowned before we get to step 2. Once it becomes unowned, the persistence layer no longer supports its access to the values in the database. Consequently, we don't know whether the null we get from reading page.name results from the page not having a name or the page becoming unowned with an unloaded name. The random nature of the outcomes – even were OpenJPA working perfectly – means that the application must use some locking, such as a reader-writer lock, to ensure predictable outcomes. With such application locking, the race conditions in OpenJPA may become irrelevant. But a deeper danger lurks. In OpenJPA (at the rev that I examined), the StateManagerImpl's lock delegates to the BrokerImpl lock. As a consequence, although there is a separate state manager associated with each persistent object (FCO) in the persistence context, there is only one lock for all them to share. Consequently, once a state manager obtains a lock, all threads are locked out of any other state managers (for the same persistence context.) Consequently, the hoped for scalability of the read-only threads cannot be achieved with the present locking architecture. For this reason, the work to fix the race condition is wasted unless a better locking architecture can be implemented that will permit the hoped for scaling of the use case. Even with this work done, the application design described above must be improved with a read-write locking strategy to prevent race conditions in the application and to insure that memory barriers are crossed. None of this work is easy, and it has been argued that there are easier ways to accomplish the requirements of the application. In my opinion, it would be worthwhile for us to decide whether we support a multithreaded domain model, and if so, what the benefits would be, and what the application programming model should be. Clarity on this issue will benefit our users. As I mentioned earlier, my results are based on an old verson of OpenJPA. More recent versions appears to show different (and bad) behavior. Ravi is looking into this configuration and may have something to say on the subject.
        Hide
        Ravi P Palacherla added a comment -

        Craig,

        Can you please give me permission so that I can be assigned issues.

        Thanks,
        Ravi.

        Show
        Ravi P Palacherla added a comment - Craig, Can you please give me permission so that I can be assigned issues. Thanks, Ravi.
        Hide
        Craig L Russell added a comment -

        Ravi, I've given you contributor status on OpenJPA JIRA. Let me know if that doesn't work for you.

        > The race condition arises from the repeated use of the following logic in enhancement added code (expressed as pseudo code here):

        > [PC object class].pc[SomeMethod]
        > {
        > if (pcStateManager == null)
        > ... do something by default with current values in
        > this object
        > else
        >

        { > ... use pcStateManager, likely resulting in an > eventual lock, and possibly in a call back to > this object's PersistenceCapable interface > methods > ... > }


        > }

        >The race condition arises because the check of the value of pcStateManager is obtained and used twice in succession without first locking the state manager. Another thread using and having locked the state manager has the time to alter the state of this pc object, including nulling out its pcStateManager reference.

        The use of the multithreaded flag was intended to add a bit of overhead to the StateManager in exchange for thread-safe behavior. It's understandable why a "context" StateManager lock was used: it prevents many cases of deadlock for multithreaded access.

        But the footprint of sm->pc seems to be small. Would it be possible to use a simple semaphore before calling the pc methods? For example,

        void provideField(PersistenceCapable pc, FieldManager store, int field) {
        if (multithreaded) {
        FieldManager beforeFM = _fm;
        _fm = store;
        synchronized(this)

        { pc.pcProvideField(field); }

        // Retaining original FM because of the possibility of reentrant calls
        _fm = beforeFM;
        } else

        { FieldManager beforeFM = _fm; _fm = store; pc.pcProvideField(field); // Retaining original FM because of the possibility of reentrant calls _fm = beforeFM; }

        }

        Show
        Craig L Russell added a comment - Ravi, I've given you contributor status on OpenJPA JIRA. Let me know if that doesn't work for you. > The race condition arises from the repeated use of the following logic in enhancement added code (expressed as pseudo code here): > [PC object class] .pc [SomeMethod] > { > if (pcStateManager == null) > ... do something by default with current values in > this object > else > { > ... use pcStateManager, likely resulting in an > eventual lock, and possibly in a call back to > this object's PersistenceCapable interface > methods > ... > } > } >The race condition arises because the check of the value of pcStateManager is obtained and used twice in succession without first locking the state manager. Another thread using and having locked the state manager has the time to alter the state of this pc object, including nulling out its pcStateManager reference. The use of the multithreaded flag was intended to add a bit of overhead to the StateManager in exchange for thread-safe behavior. It's understandable why a "context" StateManager lock was used: it prevents many cases of deadlock for multithreaded access. But the footprint of sm->pc seems to be small. Would it be possible to use a simple semaphore before calling the pc methods? For example, void provideField(PersistenceCapable pc, FieldManager store, int field) { if (multithreaded) { FieldManager beforeFM = _fm; _fm = store; synchronized(this) { pc.pcProvideField(field); } // Retaining original FM because of the possibility of reentrant calls _fm = beforeFM; } else { FieldManager beforeFM = _fm; _fm = store; pc.pcProvideField(field); // Retaining original FM because of the possibility of reentrant calls _fm = beforeFM; } }
        Hide
        David Ezzio added a comment - - edited

        Use of synchronized(this) is problematic. We end up with two locks. Some
        threads will come to own the pc lock and then seek to own the sm lock. Other
        threads will come to own the sm lock and then seek to own the pc lock.

        In the OpenJPA that I examined, it creates only one use of "synchronized" in the
        enhanced code, on the pcReplaceStateManager method. Addition of synchronization
        to the Page.getName() method (application code) leads to immediate deadlock in
        the test case. With proper locking of the sm prior to its use by enhancement
        added code, there is no need for a synchronized on the pcReplaceStateManager method
        because only the state manager will call that method and the state manager will
        be locked by that point.

        In my view, to make it safe to multithread persistence contexts and to achieve
        the high scaling that such a use case is normally striving for, we need the
        following changes. Put on your hard hat for these renovations!

        1. an atomic BitSet (which doesn't exist yet) to track which fields are
        loaded.

        2. freeze optimistic versus pessimistic transactions to the creation of the
        persistence context. This is to allow each persistent object to know whether
        it will be participating in pessimistic or optimistic transactions.

        3. no use of unlocked state managers. In other words, something like this:

        protected StateManager pcLockStateManager()
        {
        StateManager sm = pcStateManager;
        if (sm == null)
        return null;
        sm.lock();

        // before we got the lock, did some other thread change the pcStateManager?
        if (sm != pcStateManager)

        { sm.unlock(); ... repeat from the top }

        // returned locked state manager for this pc
        return sm;
        }

        pcSomeMethodThatCannotAssumeExistingStateManager(...)
        {
        StateManager sm = pcLockStateMangaer()
        if (sm == null)

        { ... use local values }

        else
        {
        try

        { ... use state manager }

        finally

        { sm.unlock() }

        }
        }

        4. All pcGetXXX methods check the AtomicBitSet to determine whether they can
        use an already loaded value or need to use a locked state manager

        private static final String pcGetXXX(UserClass obj)
        {
        int i = pcInheritedFieldCount + ?;

        // are we in an opt transaction and is the field loaded?
        if (pcAtomicBitSet != null && pcAtomicBitSet.get)

        { return obj.XXX; }

        StateManager sm = pcLockStateManager()
        if (sm == null)
        { return obj.XXX; }

        try

        { sm.accessingField(i); return obj.XXX; }

        finally

        { sm.unlock() }

        }

        5. Only a locked state manager can initiate changes to the pcAtomicBitSet
        field. If it is non-null, the context will only use opt transactions,
        otherwise, it will only use pessimistic transactions. The choice is
        invariant for the lifetime of the persistence context.

        6. Enhancement no longer makes the pcReplaceStateManager method
        synchronized, so that we avoid deadlocking with a possible application lock.

        In this fashion, we still have only one lock that is context wide, but for
        read-only in a mostly loaded persistence context with optimistic transactions,
        we get fast multithreaded reads because the context lock is used infrequently.
        The potential for scaling is huge, when we consider multi-CPU, huge memory
        hardware platforms. Not only do we by-pass a lot of code when using the state
        manager, but we by-pass the context wide lock that now effectively serializes
        all reading threads.

        The only fly in the ointment is that the application code when performing a fast
        read does not cross a memory barrier in the enhancement added code to access the
        XXX field. Consequently, the application may be reading the value from a
        register or from main memory. To handle this, and other issues such as orphaned
        SCOs, the application code needs to acquire a reader/writer lock before using the
        shared persistence context.

        Show
        David Ezzio added a comment - - edited Use of synchronized(this) is problematic. We end up with two locks. Some threads will come to own the pc lock and then seek to own the sm lock. Other threads will come to own the sm lock and then seek to own the pc lock. In the OpenJPA that I examined, it creates only one use of "synchronized" in the enhanced code, on the pcReplaceStateManager method. Addition of synchronization to the Page.getName() method (application code) leads to immediate deadlock in the test case. With proper locking of the sm prior to its use by enhancement added code, there is no need for a synchronized on the pcReplaceStateManager method because only the state manager will call that method and the state manager will be locked by that point. In my view, to make it safe to multithread persistence contexts and to achieve the high scaling that such a use case is normally striving for, we need the following changes. Put on your hard hat for these renovations! 1. an atomic BitSet (which doesn't exist yet) to track which fields are loaded. 2. freeze optimistic versus pessimistic transactions to the creation of the persistence context. This is to allow each persistent object to know whether it will be participating in pessimistic or optimistic transactions. 3. no use of unlocked state managers. In other words, something like this: protected StateManager pcLockStateManager() { StateManager sm = pcStateManager; if (sm == null) return null; sm.lock(); // before we got the lock, did some other thread change the pcStateManager? if (sm != pcStateManager) { sm.unlock(); ... repeat from the top } // returned locked state manager for this pc return sm; } pcSomeMethodThatCannotAssumeExistingStateManager(...) { StateManager sm = pcLockStateMangaer() if (sm == null) { ... use local values } else { try { ... use state manager } finally { sm.unlock() } } } 4. All pcGetXXX methods check the AtomicBitSet to determine whether they can use an already loaded value or need to use a locked state manager private static final String pcGetXXX(UserClass obj) { int i = pcInheritedFieldCount + ?; // are we in an opt transaction and is the field loaded? if (pcAtomicBitSet != null && pcAtomicBitSet.get ) { return obj.XXX; } StateManager sm = pcLockStateManager() if (sm == null) { return obj.XXX; } try { sm.accessingField(i); return obj.XXX; } finally { sm.unlock() } } 5. Only a locked state manager can initiate changes to the pcAtomicBitSet field. If it is non-null, the context will only use opt transactions, otherwise, it will only use pessimistic transactions. The choice is invariant for the lifetime of the persistence context. 6. Enhancement no longer makes the pcReplaceStateManager method synchronized, so that we avoid deadlocking with a possible application lock. In this fashion, we still have only one lock that is context wide, but for read-only in a mostly loaded persistence context with optimistic transactions, we get fast multithreaded reads because the context lock is used infrequently. The potential for scaling is huge, when we consider multi-CPU, huge memory hardware platforms. Not only do we by-pass a lot of code when using the state manager, but we by-pass the context wide lock that now effectively serializes all reading threads. The only fly in the ointment is that the application code when performing a fast read does not cross a memory barrier in the enhancement added code to access the XXX field. Consequently, the application may be reading the value from a register or from main memory. To handle this, and other issues such as orphaned SCOs, the application code needs to acquire a reader/writer lock before using the shared persistence context.
        Hide
        Christiaan added a comment -

        David,
        I definitely like your proposal on making this more scalable and high-performant. I indeed use the same scenario of multiple threads reading and one writing and also discovered that there is hardly any benefit for this due to the current locking mechanism.

        One other note, you mention that the application should do some locking for the embedded objects. However, to avoid deadlock problems, the application needs to have access to the same read/write locks as OpenJPA uses.

        Show
        Christiaan added a comment - David, I definitely like your proposal on making this more scalable and high-performant. I indeed use the same scenario of multiple threads reading and one writing and also discovered that there is hardly any benefit for this due to the current locking mechanism. One other note, you mention that the application should do some locking for the embedded objects. However, to avoid deadlock problems, the application needs to have access to the same read/write locks as OpenJPA uses.
        Hide
        Ravi P Palacherla added a comment -

        Started looking into the issue on openJPA trunk.

        The cause of the deadlock issue on trunk is as follows:

        Thread0 takes a reentrant lock inside BrokerImpl and waits to acquire reentrant lock inside statemanagerImpl.
        Thread1 takes a reentrant lock inside StatemanagerImpl and waits to acquire reentrant lock inside BrokerImpl.

        When I add debug to lock of both BrokerImpl and StateManagerImpl, here is what I see:

        [java] Thread-0 Locked java.util.concurrent.locks.ReentrantLock@11410e5[Locked by thread Thread-0]
        inside SMImpl org.apache.openjpa.kernel.StateManagerImpl@61373f
        [java] Thread-0 trying to lock java.util.concurrent.locks.ReentrantLock@402af3[Locked by thread Thread-1]
        inside brokerImpl org.apache.openjpa.kernel.FinalizingBrokerImpl@b3319f

        [java] Thread-1 Locked java.util.concurrent.locks.ReentrantLock@402af3[Locked by thread Thread-1]
        inside brokerImpl org.apache.openjpa.kernel.FinalizingBrokerImpl@b3319f
        [java] Thread-1 trying to lock java.util.concurrent.locks.ReentrantLock@11410e5[Locked by thread Thread-0]
        inside SMImpl org.apache.openjpa.kernel.StateManagerImpl@61373f

        I will continue my research and update the post as soon as I have a resolution plan in place.

        Show
        Ravi P Palacherla added a comment - Started looking into the issue on openJPA trunk. The cause of the deadlock issue on trunk is as follows: Thread0 takes a reentrant lock inside BrokerImpl and waits to acquire reentrant lock inside statemanagerImpl. Thread1 takes a reentrant lock inside StatemanagerImpl and waits to acquire reentrant lock inside BrokerImpl. When I add debug to lock of both BrokerImpl and StateManagerImpl, here is what I see: [java] Thread-0 Locked java.util.concurrent.locks.ReentrantLock@11410e5 [Locked by thread Thread-0] inside SMImpl org.apache.openjpa.kernel.StateManagerImpl@61373f [java] Thread-0 trying to lock java.util.concurrent.locks.ReentrantLock@402af3 [Locked by thread Thread-1] inside brokerImpl org.apache.openjpa.kernel.FinalizingBrokerImpl@b3319f [java] Thread-1 Locked java.util.concurrent.locks.ReentrantLock@402af3 [Locked by thread Thread-1] inside brokerImpl org.apache.openjpa.kernel.FinalizingBrokerImpl@b3319f [java] Thread-1 trying to lock java.util.concurrent.locks.ReentrantLock@11410e5 [Locked by thread Thread-0] inside SMImpl org.apache.openjpa.kernel.StateManagerImpl@61373f I will continue my research and update the post as soon as I have a resolution plan in place.
        Hide
        Ravi P Palacherla added a comment -

        Also the issue can be reproduced with a simple testcase.

        Let's say you have an entity Book and it has a String title.

        In your test code persist the entity book.
        Let a reader thread do book.getTitle() in a while loop
        and
        a writer thread that will do a transactional book.setTitle() in a while loop.

        Same Book obj is used throughout the sample.

        In the above setup the test code will hang because of the issue explained in my previous comment.
        Will try to create a JIRA test case for the above explanation and will attach it to this issue.

        Regards,
        Ravi.

        Show
        Ravi P Palacherla added a comment - Also the issue can be reproduced with a simple testcase. Let's say you have an entity Book and it has a String title. In your test code persist the entity book. Let a reader thread do book.getTitle() in a while loop and a writer thread that will do a transactional book.setTitle() in a while loop. Same Book obj is used throughout the sample. In the above setup the test code will hang because of the issue explained in my previous comment. Will try to create a JIRA test case for the above explanation and will attach it to this issue. Regards, Ravi.
        Hide
        Ravi P Palacherla added a comment -

        Attached patch file on trunk has a test case that replicates deadlock issue on trunk.

        The issue is introduced as part of OPENJPA-825 ( r727297).
        A separate lock was included for statemanagerImpl as part of the above JIRA and that is causing the deadlock.

        Show
        Ravi P Palacherla added a comment - Attached patch file on trunk has a test case that replicates deadlock issue on trunk. The issue is introduced as part of OPENJPA-825 ( r727297). A separate lock was included for statemanagerImpl as part of the above JIRA and that is causing the deadlock.
        Hide
        Ravi P Palacherla added a comment -

        Attached patch fixes the deadlock issue on trunk.

        It also contains a small change in TestBooleanValue ( includes CLEAR_TABLES in setup).
        This is not relevant to this problem but it will help pass the test cases.
        If a separate JIRA is needed for it, I can open one.

        Show
        Ravi P Palacherla added a comment - Attached patch fixes the deadlock issue on trunk. It also contains a small change in TestBooleanValue ( includes CLEAR_TABLES in setup). This is not relevant to this problem but it will help pass the test cases. If a separate JIRA is needed for it, I can open one.
        Hide
        Ravi P Palacherla added a comment -

        Hi,

        Can someone please help commit these changes.

        Once these changes are committed , I will try to replicate the nullpointer on trunk.

        Thanks,
        Ravi.

        Show
        Ravi P Palacherla added a comment - Hi, Can someone please help commit these changes. Once these changes are committed , I will try to replicate the nullpointer on trunk. Thanks, Ravi.

          People

          • Assignee:
            Ravi P Palacherla
            Reporter:
            Christiaan
          • Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Development