[CASSANDRA-6255] Exception count not incremented on OutOfMemoryError (HSHA) - ASF JIRA

Agile Board

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Low
Resolution: Fixed
Fix Version/s: 1.2.14
Component/s: None
Labels:
None
Environment:

Oracle java version "1.7.0_15"

rpc_server_type: hsha

Severity:
Low
Since Version:

1.2.10

Description

One of our nodes decided to stop listening on 9160 (netstat -l was showing nothing and telnet was reporting connection refused). Nodetool status showed no hosts down and on the offending node nodetool info gave the following:

nodetool info
Token            : (invoke with -T/--tokens to see all 256 tokens)
ID               : (removed)
Gossip active    : true
Thrift active    : true
Native Transport active: false
Load             : 2.05 TB
Generation No    : 1382536528
Uptime (seconds) : 432970
Heap Memory (MB) : 8098.05 / 14131.25
Data Center      : DC1
Rack             : RAC2
Exceptions       : 0
Key Cache        : size 536854996 (bytes), capacity 536870912 (bytes), 41383646 hits, 1710831591 requests, 0.024 recent hit rate, 0 save period in seconds
Row Cache        : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

After looking at the cassandra log, I saw a bunch of the following:

ERROR [Selector-Thread-16] 2013-10-27 17:36:00,370 CustomTHsHaServer.java (line 187) Uncaught Exception: 
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:145)
        at org.apache.cassandra.thrift.CustomTHsHaServer.requestInvoke(CustomTHsHaServer.java:337)
        at org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.handleRead(CustomTHsHaServer.java:281)
        at org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.select(CustomTHsHaServer.java:224)
        at org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.run(CustomTHsHaServer.java:182)
ERROR [Selector-Thread-7] 2013-10-27 17:36:00,370 CustomTHsHaServer.java (line 187) Uncaught Exception: 
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:691)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
        at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.execute(DebuggableThreadPoolExecutor.java:145)
        at org.apache.cassandra.thrift.CustomTHsHaServer.requestInvoke(CustomTHsHaServer.java:337)
        at org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.handleRead(CustomTHsHaServer.java:281)
        at org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.select(CustomTHsHaServer.java:224)
        at org.apache.cassandra.thrift.CustomTHsHaServer$SelectorThread.run(CustomTHsHaServer.java:182)

There wasn't anything else overtly suspicious in the logs except for the occasional

ERROR [Selector-Thread-0] 2013-10-27 17:35:58,662 TNonblockingServer.java (line 468) Read an invalid frame size of 0. Are you using TFramedTransport on the client side?

but that periodically comes up - I have looked into it before but it has never seemed to have any serious impact.

This ticket is not about why an OutOfMemoryError occurred - which is bad but I don't think I have enough information to reproduce or speculate on a cause. This ticket is about the fact that an OutOfMemoryError occurred and nodetool info was reporting Thrift active : true and Exceptions : 0.

Our monitoring systems and investigation processes are both starting to rely on on the exception count. The fact that it was not accurate here is disconcerting.

Attachments

CASSANDRA-1.2-6255.patch
17/Jan/14 09:42
2 kB
Mikhail Stepura

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Mikhail Stepura Assign to me

Reporter:: Dan Hendry

Authors:: Mikhail Stepura

Reviewers:: Brandon Williams

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/Oct/13 18:49

Updated:: 16/Apr/19 12:32

Resolved:: 17/Jan/14 20:09

Agile

View on Board

Exception count not incremented on OutOfMemoryError (HSHA)

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment