Details
-
Bug
-
Status: Resolved
-
Urgent
-
Resolution: Fixed
-
None
-
Critical
Description
I have a 6 nodes cluster in prod in 3 racks.
each node :
- 4Gb commitlogs on 343 files
- 275Gb data on 504 files
On saturday, 1 node in each rack crash with with too many open files (seems to be the similar node in each rack).
lsof -n -p $PID give me 66899 out of 65826 max
it contains 64527 open directories (2371 uniq)
a part of the list :
java 19076 root 2140r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2141r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2142r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2143r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2144r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2145r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2146r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2147r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2148r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2149r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2150r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2151r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2152r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2153r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2154r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95 java 19076 root 2155r DIR 8,17 143360 4386718705 /opt/stage2/pod-cassandra-aci-cassandra/rootfs/data/keyspaces/email_logs_query/emails-2d4abd00e9ea11e591199d740e07bd95
The 3 others nodes crashes 4 hours later
Attachments
Attachments
Issue Links
- duplicates
-
CASSANDRA-11906 Unstable JVM due too many files when anticompacting big LCS tables
- Resolved
- relates to
-
CASSANDRA-13133 Unclosed file descriptors when querying SnapshotsSize metric
- Resolved