With Ravi's permission, I have taken his test case and
modified it somewhat and uploaded it here.
My interest in the test case contained in
OpenJPABug453Embedded.zip concerned its behavior with regard
to the OpenJPA version that formed the basis for Kodo JDO
4.1.4, for which I have a related test case. The specific
version of OpenJPA of interest to me was rev547073, a
My goal was to determine whether the behavior was the same for
both Kodo JDO and the version of OpenJPA that Kodo uses. For
the same configuration (namely lazy-fetch of the Page.name
field, the clearing of fields at the end of a transaction, and
optimistic transactions), the behavior is essentially the
same. In both cases, a NPE in enhancement added code results
from a race condition.
The race condition arises from the repeated use of the
following logic in enhancement added code (expressed as pseudo
[PC object class].pc[SomeMethod]
if (pcStateManager == null)
... do something by default with current values in
... use pcStateManager, likely resulting in an
eventual lock, and possibly in a call back to
this object's PersistenceCapable interface
The race condition arises because the check of the value of
pcStateManager is obtained and used twice in succession
without first locking the state manager. Another thread using
and having locked the state manager has the time to alter the
state of this pc object, including nulling out its
As a result, a NPE can occur in the else clause above, or in
the callback from the state manager to other enhancement added
methods where the assumption is that pcStateManager reference
is not null.
The behavior for OpenJPA 2.0 and trunk is somewhat different,
and Ravi is looking into this configuration.
For the specific problem encountered by this test case in the
configuration that I tested, the work around is to eagerly
load the Page.name field. This is the default configuration
for JPA (as well as JDO.)
The test case represents a problematic use case. Before a fix
should be attempted, I believe it is important to come up with
one or more test cases that represent real-world use cases.
The test case has the following design. It consists of three
threads. The main thread sets up a reference to a FCO
persistent Book object and its embedded (SCO) Page object. The
book has a title field and the page has a name field. The
domain is reasonable and appropriately simple for a test case.
The main thread sets up two runnable threads. The transacting
thread cycles transactions as quickly as possible continually
changing the book's title and committing. This is a reasonable
stress condition. The read-only thread uses the self-same book
object to continually get the book's page and read the page's
name. The configuration is for optimistic transactions, with
the clearing of transactional objects' transactional state at
the end each transaction (RetainState == false), and the
allowing of non-transactional reads. Multithreading is set to
The general use case and application design behind this test,
if I understand it correctly, goes something like this. We
have a set of domain objects that we want to be in memory most
of the time. Our application mostly reads, but also writes.
We'll have a variety of reader threads, and one writer thread.
By using the above configuration, – so goes the thinking –
we can have fast reads, a persistence context (cache) that
heals itself, since anything unloaded at the end of a
transaction, or not yet loaded, will be loaded during the next
read, and a minimal of object creation and garbage collection.
As a result, the app will run as fast as available CPU,
memory, and the locking inherent in the persistence layer will
So goes the Sirens song, but there are dangers. One such
danger relates to second class objects (SCOs) and the lack of
locking by the application on the domain objects. Basically,
any reference to an SCO that we obtain in step 1 of an
application method may become unowned before we get to step 2.
Once it becomes unowned, the persistence layer no longer
supports its access to the values in the database.
Consequently, we don't know whether the null we get from
reading page.name results from the page not having a name or
the page becoming unowned with an unloaded name. The random
nature of the outcomes – even were OpenJPA working perfectly
– means that the application must use some locking, such as a
reader-writer lock, to ensure predictable outcomes. With such
application locking, the race conditions in OpenJPA may become
But a deeper danger lurks. In OpenJPA (at the rev that I
examined), the StateManagerImpl's lock delegates to the
BrokerImpl lock. As a consequence, although there is a
separate state manager associated with each persistent object
(FCO) in the persistence context, there is only one lock for
all them to share. Consequently, once a state manager obtains
a lock, all threads are locked out of any other state managers
(for the same persistence context.) Consequently, the hoped
for scalability of the read-only threads cannot be achieved
with the present locking architecture.
For this reason, the work to fix the race condition is wasted
unless a better locking architecture can be implemented that
will permit the hoped for scaling of the use case. Even with
this work done, the application design described above must be
improved with a read-write locking strategy to prevent race
conditions in the application and to insure that memory
barriers are crossed. None of this work is easy, and it has
been argued that there are easier ways to accomplish the
requirements of the application.
In my opinion, it would be worthwhile for us to decide whether
we support a multithreaded domain model, and if so, what the
benefits would be, and what the application programming model
should be. Clarity on this issue will benefit our users.
As I mentioned earlier, my results are based on an old verson
of OpenJPA. More recent versions appears to show different
(and bad) behavior. Ravi is looking into this configuration
and may have something to say on the subject.