Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Discovery Impl 1.0.2
-
None
Description
There is a race condition between two instances in a cluster (eg oak or crx): Instance 1 is writing a job with a binary property, instance 2 is reading the job (likely triggered by discovery sending it a topologychangedevent). It looks like instance 2 is reading the job just about while instance 1 is still in the process or completely writing the job, or at least the binary. Resulting in the following exception:
04.03.2014 06:55:39.667 WARN [Apache Sling Job Background Loader] org.apache.sling.event.impl.jobs.JobManagerImpl Unable to read job from /var/eventing/jobs/assigned/e4337f8f-47d2-41df-b3ab-0d40b1b2acd4/slingevent:eventadmin/2014/3/3/8/45/cq.wcm.msm.job.pageEvent_9718d7db-85b4-4930-a2ba-11a80d772970_172
java.lang.Exception: Unable to deserialize property 'pageEvent'
at org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:213)
at org.apache.sling.event.impl.jobs.JobManagerImpl.readJob(JobManagerImpl.java:538)
at org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobInTheBackground(BackgroundLoader.java:318)
at org.apache.sling.event.impl.jobs.BackgroundLoader.loadJobsInTheBackground(BackgroundLoader.java:294)
at org.apache.sling.event.impl.jobs.BackgroundLoader.run(BackgroundLoader.java:203)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException: null
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2280)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2749)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:779)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:279)
at org.apache.sling.event.impl.support.ResourceHelper.cloneValueMap(ResourceHelper.java:208)
... 5 common frames omitted
Attachments
Issue Links
- is blocked by
-
SLING-4638 isCurrent() not always correctly set
- Closed
-
SLING-5030 replace isolated mode with (larger) TOPOLOGY_CHANGING phase
- Closed
- is duplicated by
-
SLING-4828 JobManagerImpl job persisting doesn't check the created resource
- Resolved
- is related to
-
SLING-4627 TOPOLOGY_CHANGED in an eventually consistent repository
- Closed
- relates to
-
SLING-3434 Make intra-cluster discovery-heartbeats independent from machine clock differences
- Closed
- requires
-
SLING-4640 Possibility of duplicate leaders w/discovery.impl on eventually consistent repo
- Closed