[CASSANDRA-5737] CassandraDaemon - recent unsafe memory access operation in compiled Java code - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Normal
Resolution: Invalid
Fix Version/s: None
Component/s: None
Labels:
None
Environment:

Amazon EC2, XLarge instance.
Ubuntu 12.04.2 LTS
Raid 0 disks, with ext4

Severity:
Normal

Description

I'm using 1.2.6 on Ubuntu AWS m1.xlarge instances with the Datastax Community package and have tried using Java versions jdk1.7.0_25 jre1.6.0_45
Also testing with and without libjna-java (ie the JNA jar)

However, something has triggered a bug in the CassandraDaemon:

ERROR [COMMIT-LOG-ALLOCATOR] 2013-07-05 15:00:51,663 CassandraDaemon.java (line 192) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:126)
at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(CommitLogSegment.java:81)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegment(CommitLogAllocator.java:250)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(CommitLogAllocator.java:48)
at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:104)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Unknown Source)

This brought two nodes down out of a three node cluster – using QUORUM write with 3 replicas.
Restarting the node replays this error, so I have the system in a 'stable' unstable state – which is probably a good place for trouble shooting.

Presumably something a client wrote triggered this situation, and the other third node was to be the final replication point – and is thus still up.

Subsequently discovered that only a reboot will allow that node to come back up.
Java Bug raised with Oracle after finding a Java dump text indicating a SIGBUS.
http://bugs.sun.com/view_bug.do?bug_id=9004953

At this point, I'm thinking that there is potentially a Linux kernel bug being triggered?

Attachments

Issue Links

relates to

CASSANDRA-17223 Node can be left in a bad state after invalid memory access exception

Triage Needed

Activity

People

Assignee:: Unassigned

Reporter:: Glyn Davies

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 09/Jul/13 07:24

Updated:: 21/Dec/21 02:19

Resolved:: 09/Jul/13 14:51