Issue Details (XML | Word | Printable)

Key: JCR-935
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Unassigned
Reporter: Pablo Rios
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Jackrabbit Content Repository

ConcurrentModificationException during logout (cont'd)

Created: 21/May/07 06:56 PM   Updated: 18/Dec/07 09:54 AM
Return to search
Component/s: jackrabbit-core
Affects Version/s: 1.3
Fix Version/s: None

Time Tracking:
Not Specified

Issue Links:
Dependants
 


 Description  « Hide
I "seldom" get a CME running ConcurrentReadWriteTest.testReadWrite test.

Following are different stack traces of two runs:
 
Exception in thread "Thread-9" java.util.ConcurrentModificationException
 at org.apache.commons.collections.map.AbstractReferenceMap$ReferenceEntrySetIterator.checkMod(AbstractReferenceMap.java:761)
 at org.apache.commons.collections.map.AbstractReferenceMap$ReferenceEntrySetIterator.hasNext(AbstractReferenceMap.java:735)
 at org.apache.commons.collections.map.AbstractReferenceMap$ReferenceValues.toArray(AbstractReferenceMap.java:543)
 at java.util.Collections$UnmodifiableCollection.toArray(Unknown Source)
 at java.util.Collections$UnmodifiableCollection.toArray(Unknown Source)
 at org.apache.jackrabbit.core.state.LocalItemStateManager.dispose(LocalItemStateManager.java:341)
 at org.apache.jackrabbit.core.state.SessionItemStateManager.dispose(SessionItemStateManager.java:316)
 at org.apache.jackrabbit.core.SessionImpl.logout(SessionImpl.java:1269)
 at org.apache.jackrabbit.core.XASessionImpl.logout(XASessionImpl.java:379)
 at org.apache.jackrabbit.core.AbstractConcurrencyTest$Executor.run(AbstractConcurrencyTest.java:114)
 at java.lang.Thread.run(Unknown Source)


Exception in thread "Thread-9" java.util.ConcurrentModificationException
at org.apache.commons.collections.map.AbstractReferenceMap$ReferenceEntrySetIterator.checkMod(AbstractReferenceMap.java:761)
at org.apache.commons.collections.map.AbstractReferenceMap$ReferenceEntrySetIterator.nextEntry(AbstractReferenceMap.java:770)
at org.apache.commons.collections.map.AbstractReferenceMap$ReferenceValuesIterator.next(AbstractReferenceMap.java:829)
at org.apache.commons.collections.map.AbstractReferenceMap$ReferenceValues.toArray(AbstractReferenceMap.java:544)
at java.util.Collections$UnmodifiableCollection.toArray(Unknown Source)
at java.util.Collections$UnmodifiableCollection.toArray(Unknown Source)
at org.apache.jackrabbit.core.state.LocalItemStateManager.dispose(LocalItemStateManager.java:341)
at org.apache.jackrabbit.core.state.SessionItemStateManager.dispose(SessionItemStateManager.java:316)
at org.apache.jackrabbit.core.SessionImpl.logout(SessionImpl.java:1269)
at org.apache.jackrabbit.core.XASessionImpl.logout(XASessionImpl.java:379)
at org.apache.jackrabbit.core.AbstractConcurrencyTest$Executor.run(AbstractConcurrencyTest.java:114)
at java.lang.Thread.run(Unknown Source)

My working copy is revision 538918 with both JCR-314 patches applied, using FineGrainedISMLocking strategy.
I run this test manually from within Eclipse 3.2.1 / JRE 1.5.0_10-b03.

The workstation were I run this test has an Intel Dual-Core Xeon 5130 2 GHz. To able to compare this CPU with another, with the default params of this test (NUM_NODES = 5, NUM_THREADS = 5, RUN_NUM_SECONDS = 20), the number of operations are ~ #writes performed: 300-400, #reads performed: 4700000 - 4900000.

Regards,
Pablo


 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Stefan Guggisberg added a comment - 01/Jun/07 10:37 AM
are you seeing those CME's only when using the FineGrainedISMLocking strategy?

FineGrainedISMLocking is still experimental and not yet used in the trunk. see marcel's
comment where he also mentions potential issues with concurrent access to item state
caches: https://issues.apache.org/jira/browse/JCR-314#action_12498226


Pablo Rios added a comment - 04/Jun/07 03:27 PM
Stefan, I did not observe those CMEs using the DefaultISMLocking strategy.

I also had got an NPE (see below) using the Fine FineGrainedISMLocking strategy.

java.lang.NullPointerException
at org.apache.jackrabbit.core.state.ItemState.pull(ItemState.java:163)
at org.apache.jackrabbit.core.state.LocalItemStateManager.stateModified(LocalItemStateManager.java:420)
at org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyStateModified(StateChangeDispatcher.java:85)
at org.apache.jackrabbit.core.state.SharedItemStateManager.stateModified(SharedItemStateManager.java:388)
at org.apache.jackrabbit.core.state.ItemState.notifyStateUpdated(ItemState.java:248)
at org.apache.jackrabbit.core.state.ChangeLog.persisted(ChangeLog.java:271)
at org.apache.jackrabbit.core.state.SharedItemStateManager$Update.end(SharedItemStateManager.java:703)
at org.apache.jackrabbit.core.state.SharedItemStateManager.update(SharedItemStateManager.java:857)
at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:326)
at org.apache.jackrabbit.core.state.XAItemStateManager.update(XAItemStateManager.java:313)
at org.apache.jackrabbit.core.state.LocalItemStateManager.update(LocalItemStateManager.java:302)
at org.apache.jackrabbit.core.state.SessionItemStateManager.update(SessionItemStateManager.java:306)
at org.apache.jackrabbit.core.ItemImpl.save(ItemImpl.java:1214)
at org.apache.jackrabbit.core.ConcurrentReadWriteTest.testReadWrite(ConcurrentReadWriteTest.java:112)
...


Stefan Guggisberg added a comment - 04/Jun/07 03:52 PM
pablo, thanks for the feedback.

as marcel mentionend on the list, FineGrainedISMLocking represents
work in progress (with known issues wrt cache access) and therefore
not ready to be used:
http://www.nabble.com/Re%3A-concurrent-writes-%28JCR-314%29-p10946007.html

not sure how we should proceed with this issue...

Marcel Reutegger added a comment - 06/Jun/07 11:58 AM
> not sure how we should proceed with this issue...

I think this is something the group needs to decide. Do we agree that the proposed changes (basically the new ISMLocking class) are the way to go?

If yes, then we will also have to tackle the concurrency issues that FineGrainedISMLocking uncovers. I think the NullPointerException is also possible with the DefaultISMLocking, it just happens very rarely.

Esteban Franqueiro added a comment - 03/Sep/07 07:54 PM - edited
Regarding the NPE that Pablo mentioned, it looks like a race condition, because the code first checks for != null, and then you get an NPE. The problem seems to disappear by synchronizing the pull(), connect(), reconnect(), disconnect(), isConnected(), isStale() and hasOverlayedState() methods of the class ItemState. According to the results of the test Pablo mentioned, it seems that sync'ing these methods doesn't add too much serialization. Still, we're wondering what do you all think about this, because although it works in the test, we don't know if there's a better (or may be correct) way of solving this issue.
Regards,

Esteban

Esteban Franqueiro added a comment - 17/Sep/07 06:55 PM
We also sync'ed push() and getOverlayedState() in ItemState class.
Regards,

Stefan Guggisberg added a comment - 19/Sep/07 01:58 PM
hi esteban,

thanks for sharing your results. as mentioned earlier, FineGrainedISMLocking represents experimental code. it's work in progress and hasn't (yet?) been 'officially' endorsed. therefore i don't consider this issue a jackrabbit bug.

if we agree to follow the approach of FineGrainedISMLocking we'll need to carefully investigate the new concurrency issues (such as the NPE mentioned earlier). the current 'synchronized' statements have been very carefully placed and are the result of exhaustive debugging sessions hunting down concurrency issues with DefaultISMLocking. synchronizing on the various ItemState methods as suggested bears IMO a high risk of causing dead locks.

cheers
stefan

Esteban Franqueiro added a comment - 19/Sep/07 07:33 PM
Hi Stefan.
Although I fully agree with you in that our workaround can cause deadlocks, this didn't happen in our testing. All tests (jcr-tests, core tests, MT tests and application level tests) pass.
Regarding the status of this issue, maybe for version 2 we can do something better, but in the meantime we think these changes are a step forward. We will work more on this issue as our knowledge of the core grows and allows us to dig deeper. But for the time being we report our findings :)
Regards,

Esteban

Stefan Guggisberg added a comment - 20/Sep/07 08:10 AM
> Esteban Franqueiro commented on JCR-935:
> ----------------------------------------
> ...
> We will work more on this issue as our knowledge of the core grows and allows us to dig deeper. But for the time being we report our findings :)

excellent, and very much appreciated :)


Marcel Reutegger added a comment - 18/Dec/07 09:54 AM
This issue is related to JCR-1271.

The cause is the same: LocalItemStateManager#cache is not properly synchronized. While one thread reads from the cache another thread notifying ItemState changes will modify the cache.