Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 2.0.3
    • Component/s: None
    • Labels:
      None
    • Environment:

      Description

      Looks like C* is leaking file descriptors when doing lots of CAS operations.

      $ sudo cat /proc/15455/limits
      Limit                     Soft Limit           Hard Limit           Units    
      Max cpu time              unlimited            unlimited            seconds  
      Max file size             unlimited            unlimited            bytes    
      Max data size             unlimited            unlimited            bytes    
      Max stack size            10485760             unlimited            bytes    
      Max core file size        0                    0                    bytes    
      Max resident set          unlimited            unlimited            bytes    
      Max processes             1024                 unlimited            processes
      Max open files            4096                 4096                 files    
      Max locked memory         unlimited            unlimited            bytes    
      Max address space         unlimited            unlimited            bytes    
      Max file locks            unlimited            unlimited            locks    
      Max pending signals       14633                14633                signals  
      Max msgqueue size         819200               819200               bytes    
      Max nice priority         0                    0                   
      Max realtime priority     0                    0                   
      Max realtime timeout      unlimited            unlimited            us 
      

      Looks like the problem is not in limits.

      Before load test:

      cassandra-test0 ~]$ lsof -n | grep java | wc -l
      166
      
      cassandra-test1 ~]$ lsof -n | grep java | wc -l
      164
      
      cassandra-test2 ~]$ lsof -n | grep java | wc -l
      180
      

      After load test:

      cassandra-test0 ~]$ lsof -n | grep java | wc -l
      967
      
      cassandra-test1 ~]$ lsof -n | grep java | wc -l
      1766
      
      cassandra-test2 ~]$ lsof -n | grep java | wc -l
      2578
      

      Most opened files have names like:

      java      16890 cassandra 1636r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1637r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1638r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1639r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1640r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1641r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1642r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1643r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1644r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1645r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1646r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1647r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1648r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1649r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1650r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1651r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1652r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1653r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1654r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      java      16890 cassandra 1655r      REG             202,17 161158485     655420 /var/lib/cassandra/data/system/paxos/system-paxos-jb-255-Data.db
      java      16890 cassandra 1656r      REG             202,17  88724987     655520 /var/lib/cassandra/data/system/paxos/system-paxos-jb-644-Data.db
      

      Also, when that happens it's not always possible to shutdown server process via SIGTERM. Have to use SIGKILL.

      p.s. See mailing thread for more context information https://www.mail-archive.com/user@cassandra.apache.org/msg33035.html

      1. 6275.txt
        1 kB
        Jonathan Ellis
      2. c_file-descriptors_strace.tbz
        6.11 MB
        Michael Shuler
      3. position_hints.tgz
        6.03 MB
        Duncan Sands
      4. leak.log
        10 kB
        Gianluca Borello
      5. slog.gz
        33 kB
        Duncan Sands
      6. cassandra_jstack.txt
        38 kB
        Mikhail Mazurskiy

        Issue Links

          Activity

          Hide
          ash2k Mikhail Mazurskiy added a comment -

          Attaching thread dump of process when it is not possible to shutdown it cleanly.

          Show
          ash2k Mikhail Mazurskiy added a comment - Attaching thread dump of process when it is not possible to shutdown it cleanly.
          Hide
          brandon.williams Brandon Williams added a comment -

          This does look rather damning, and I know something is up here because we can't run 2.0 with ccm on debug, so maybe this is it.

          Show
          brandon.williams Brandon Williams added a comment - This does look rather damning, and I know something is up here because we can't run 2.0 with ccm on debug, so maybe this is it.
          Hide
          baldrick Duncan Sands added a comment -

          Part of system.log showing lots of memtable flush messages.

          Show
          baldrick Duncan Sands added a comment - Part of system.log showing lots of memtable flush messages.
          Hide
          baldrick Duncan Sands added a comment -

          We had to downgrade from 2.0.2 to 1.2.11 due to C* leaking vast numbers of file descriptors (same issue in 2.0.1), approximately 50 a second. We didn't have any trouble shutting down servers, we also were not using Paxos, so might be a different issue. Like in this report, the same sstable files had been opened again and again (and again) only they were from one of our own column families, not from a system column family. This table was being polled repeatedly (once per second per program) by about 20 programs using the native protocol; the polling was doing a range slice. I haven't been able to reproduce the file descriptor leak in our test environment.

          One thing I noticed is that the system logs were chock-a-block with memtable flushing messages, many many more than we would get with 1.2.11 (I've attached a snippet).

          Show
          baldrick Duncan Sands added a comment - We had to downgrade from 2.0.2 to 1.2.11 due to C* leaking vast numbers of file descriptors (same issue in 2.0.1), approximately 50 a second. We didn't have any trouble shutting down servers, we also were not using Paxos, so might be a different issue. Like in this report, the same sstable files had been opened again and again (and again) only they were from one of our own column families, not from a system column family. This table was being polled repeatedly (once per second per program) by about 20 programs using the native protocol; the polling was doing a range slice. I haven't been able to reproduce the file descriptor leak in our test environment. One thing I noticed is that the system logs were chock-a-block with memtable flushing messages, many many more than we would get with 1.2.11 (I've attached a snippet).
          Hide
          mishail Mikhail Stepura added a comment -

          Also, when that happens it's not always possible to shutdown server process via SIGTERM. Have to use SIGKILL.

          As far as I understand here what is happening

          • SIGTERM handler waits for StorageServiceShutdownHook
          • StorageServiceShutdownHook waits (up to 3600 sec == 1hr) for mutationStage threads to complete.
          • "MutationStage:2718" thread performs ColumnFamilyStore.forceBlockingFlush initiated by TruncateVerbHandler.doVerb and waits for "MemtablePostFlusher:1"
          • "MemtablePostFlusher:1" is waiting on CountDownLatch.await (in WrappedRunnable returned from ColumnFamilyStore.switchMemtable). It will wait until the latch is counted down to zero.

          There is also another call to ColumnFamilyStore.forceBlockingFlush from "OptionalTasks:1":BatchlogManager.cleanup() .

          Show
          mishail Mikhail Stepura added a comment - Also, when that happens it's not always possible to shutdown server process via SIGTERM. Have to use SIGKILL. As far as I understand here what is happening SIGTERM handler waits for StorageServiceShutdownHook StorageServiceShutdownHook waits (up to 3600 sec == 1hr ) for mutationStage threads to complete. "MutationStage:2718" thread performs ColumnFamilyStore.forceBlockingFlush initiated by TruncateVerbHandler.doVerb and waits for "MemtablePostFlusher:1" "MemtablePostFlusher:1" is waiting on CountDownLatch.await (in WrappedRunnable returned from ColumnFamilyStore.switchMemtable) . It will wait until the latch is counted down to zero. There is also another call to ColumnFamilyStore.forceBlockingFlush from "OptionalTasks:1":BatchlogManager.cleanup() .
          Hide
          jbellis Jonathan Ellis added a comment -

          Hmm. It's possible that we shouldn't be running Truncate on the Mutation stage. But, I don't think Duncan or Mikhail mentioned running truncate so there is probably something else wrong as well.

          Show
          jbellis Jonathan Ellis added a comment - Hmm. It's possible that we shouldn't be running Truncate on the Mutation stage. But, I don't think Duncan or Mikhail mentioned running truncate so there is probably something else wrong as well.
          Hide
          mishail Mikhail Stepura added a comment -

          Jonathan Ellis still there is TruncateVerbHandler.doVerb in Mikhail's cassandra_jstack.txt

          Show
          mishail Mikhail Stepura added a comment - Jonathan Ellis still there is TruncateVerbHandler.doVerb in Mikhail's cassandra_jstack.txt
          Hide
          brandon.williams Brandon Williams added a comment -

          Duncan Sands are you using the native protocol?

          Show
          brandon.williams Brandon Williams added a comment - Duncan Sands are you using the native protocol?
          Hide
          baldrick Duncan Sands added a comment -

          Yes, I'm using the native protocol. No use of truncate on my part.

          Show
          baldrick Duncan Sands added a comment - Yes, I'm using the native protocol. No use of truncate on my part.
          Hide
          jbellis Jonathan Ellis added a comment -

          Let's create a new ticket for the truncate hang then.

          Show
          jbellis Jonathan Ellis added a comment - Let's create a new ticket for the truncate hang then.
          Hide
          ash2k Mikhail Mazurskiy added a comment -

          My load test uses TRUNCATE before it starts. I will check if that leak happens without it.

          Show
          ash2k Mikhail Mazurskiy added a comment - My load test uses TRUNCATE before it starts. I will check if that leak happens without it.
          Hide
          ash2k Mikhail Mazurskiy added a comment -

          I tried without TRUNCATE - no leaking (test TRUNCATEs 3 tables before it starts).

          Before test:

          [root@cassandra-test0 ~]$ lsof -n | grep java | wc -l
          169
          
          [root@cassandra-test1 ~]$ lsof -n | grep java | wc -l
          167
          
          [root@cassandra-test2 ~]# lsof -n | grep java | wc -l
          173
          

          After test:

          [root@cassandra-test0 ~]$ lsof -n | grep java | wc -l
          172
          
          [root@cassandra-test1 ~]$ lsof -n | grep java | wc -l
          172
          
          [root@cassandra-test2 ~]# lsof -n | grep java | wc -l
          183
          
          Show
          ash2k Mikhail Mazurskiy added a comment - I tried without TRUNCATE - no leaking (test TRUNCATEs 3 tables before it starts). Before test: [root@cassandra-test0 ~]$ lsof -n | grep java | wc -l 169 [root@cassandra-test1 ~]$ lsof -n | grep java | wc -l 167 [root@cassandra-test2 ~]# lsof -n | grep java | wc -l 173 After test: [root@cassandra-test0 ~]$ lsof -n | grep java | wc -l 172 [root@cassandra-test1 ~]$ lsof -n | grep java | wc -l 172 [root@cassandra-test2 ~]# lsof -n | grep java | wc -l 183
          Hide
          baldrick Duncan Sands added a comment -

          In my case I didn't use truncate, however I did do one exotic operation: I used alter table to drop a no longer needed column a few days before I noticed this issue.

          Before downgrading I made a copy of the entire contents of /var/lib/cassandra/, so I could try recreating the 2.0.2 cluster and the problem in some virtual machines using these.

          Show
          baldrick Duncan Sands added a comment - In my case I didn't use truncate, however I did do one exotic operation: I used alter table to drop a no longer needed column a few days before I noticed this issue. Before downgrading I made a copy of the entire contents of /var/lib/cassandra/, so I could try recreating the 2.0.2 cluster and the problem in some virtual machines using these.
          Hide
          baldrick Duncan Sands added a comment -

          I also changed sstable_compression from SnappyCompressor to LZ4Compressor.

          Show
          baldrick Duncan Sands added a comment - I also changed sstable_compression from SnappyCompressor to LZ4Compressor.
          Hide
          gianlucaborello Gianluca Borello added a comment - - edited

          We are experiencing a similar issue in 2.0.2.

          It started happening after we set a TTL for all our columns in a very limited datastore (just a few GBs).

          We can easily see the fd count rapidly increase to 100000+, and the majority of fds are (from lsof):

          java      13168       cassandra  267r      REG                9,0   273129  671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted)
          java      13168       cassandra  268r      REG                9,0   273129  671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted)
          java      13168       cassandra  269r      REG                9,0   273129  671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted)
          java      13168       cassandra  270r      REG                9,0   273129  671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted)
          java      13168       cassandra  271r      REG                9,0   273129  671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted)
          java      13168       cassandra  272r      REG                9,0   273129  671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted)
          java      13168       cassandra  273r      REG                9,0   273129  671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted)
          

          I'm attaching the log of the exception (leak.log). You can see the exceptions, and then Cassandra eventually shuts down. We had to temporarily downgrade to 1.2.11

          Show
          gianlucaborello Gianluca Borello added a comment - - edited We are experiencing a similar issue in 2.0.2. It started happening after we set a TTL for all our columns in a very limited datastore (just a few GBs). We can easily see the fd count rapidly increase to 100000+, and the majority of fds are (from lsof): java 13168 cassandra 267r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 268r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 269r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 270r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 271r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 272r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) java 13168 cassandra 273r REG 9,0 273129 671089723 /raid0/cassandra/data/draios/process_counters_by_exe/draios-process_counters_by_exe-jb-231-Data.db (deleted) I'm attaching the log of the exception (leak.log). You can see the exceptions, and then Cassandra eventually shuts down. We had to temporarily downgrade to 1.2.11
          Hide
          jbellis Jonathan Ellis added a comment -

          Mikhail Mazurskiy or others, can you verify if this is also a problem in 1.2.11?

          Show
          jbellis Jonathan Ellis added a comment - Mikhail Mazurskiy or others, can you verify if this is also a problem in 1.2.11?
          Hide
          gianlucaborello Gianluca Borello added a comment -

          Jonathan Ellis, FWIW 1.2.11 is working fine for us (I have one week of uptime so far since the downgrade).

          Show
          gianlucaborello Gianluca Borello added a comment - Jonathan Ellis , FWIW 1.2.11 is working fine for us (I have one week of uptime so far since the downgrade).
          Hide
          jbellis Jonathan Ellis added a comment -

          Can you have a look Marcus Eriksson ?

          Show
          jbellis Jonathan Ellis added a comment - Can you have a look Marcus Eriksson ?
          Hide
          baldrick Duncan Sands added a comment -

          All our leaked fd's were for a large table that uses TTL. We also don't have any problems with 1.2.11 (which we had to downgrade to, just like Gianluca).

          Show
          baldrick Duncan Sands added a comment - All our leaked fd's were for a large table that uses TTL. We also don't have any problems with 1.2.11 (which we had to downgrade to, just like Gianluca).
          Hide
          ash2k Mikhail Mazurskiy added a comment -

          Jonathan Ellis my test workload uses LWT so it cannot be run on 1.2.x. The written columns themself do not use TTL but AFAIK Paxos table uses TTL. And in my case I see a lot of open Paxos-related files. So, taking into account what others said above, looks like the problem is somehow connected to TTL.

          Show
          ash2k Mikhail Mazurskiy added a comment - Jonathan Ellis my test workload uses LWT so it cannot be run on 1.2.x. The written columns themself do not use TTL but AFAIK Paxos table uses TTL. And in my case I see a lot of open Paxos-related files. So, taking into account what others said above, looks like the problem is somehow connected to TTL.
          Hide
          krummas Marcus Eriksson added a comment -

          been looking at this a bit and can't really reproduce, suspected CASSANDRA-5228 - but that seems to work (or, found a bug, but unrelated to this, CASSANDRA-6337)

          slog.gz looks a bit like what Michael Kjellman reported in CASSANDRA-5241 (looping flushing of system tables) but that was resolved a long time ago.

          does anyone have a way to reproduce? Mikhail Mazurskiy would it be possible to post your load test?

          Show
          krummas Marcus Eriksson added a comment - been looking at this a bit and can't really reproduce, suspected CASSANDRA-5228 - but that seems to work (or, found a bug, but unrelated to this, CASSANDRA-6337 ) slog.gz looks a bit like what Michael Kjellman reported in CASSANDRA-5241 (looping flushing of system tables) but that was resolved a long time ago. does anyone have a way to reproduce? Mikhail Mazurskiy would it be possible to post your load test?
          Hide
          rcoli Robert Coli added a comment -

          A brief note to mention that when durable_writes are disabled, handling of clean shutdown (via SIGTERM/StorageServiceShutdownHook) has additional blocking while waiting for drain. If one is investigating and/or modifying the behavior of the shutdown hook, they should be aware that there are two different cases to test. See CASSANDRA-2958.

          Show
          rcoli Robert Coli added a comment - A brief note to mention that when durable_writes are disabled, handling of clean shutdown (via SIGTERM/StorageServiceShutdownHook) has additional blocking while waiting for drain. If one is investigating and/or modifying the behavior of the shutdown hook, they should be aware that there are two different cases to test. See CASSANDRA-2958 .
          Hide
          ash2k Mikhail Mazurskiy added a comment -

          Marcus Eriksson sorry, I cannot post that code. The scenario is something like this:

          TRUNCATE table1;
          TRUNCATE table2;
          TRUNCATE table3;
          loop {
              SELECT FROM table1 WHERE key='xxx' (quorum);
              INSERT INTO table2 (quorum);
              INSERT INTO table3 IF NOT EXISTS; (should sometimes fail)
              UPDATE table1 WHERE  key = 'someid' IF column='zzz'; (should sometimes fail)
          }
          

          As I said, if I remove TRUNCATEs it do not leak.

          Show
          ash2k Mikhail Mazurskiy added a comment - Marcus Eriksson sorry, I cannot post that code. The scenario is something like this: TRUNCATE table1; TRUNCATE table2; TRUNCATE table3; loop { SELECT FROM table1 WHERE key='xxx' (quorum); INSERT INTO table2 (quorum); INSERT INTO table3 IF NOT EXISTS; (should sometimes fail) UPDATE table1 WHERE key = 'someid' IF column='zzz'; (should sometimes fail) } As I said, if I remove TRUNCATEs it do not leak.
          Hide
          baldrick Duncan Sands added a comment -

          OK, here is how you can reproduce.

          1) Create this keyspace:

          CREATE KEYSPACE all_production WITH replication =

          {'class': 'SimpleStrategy', 'replication_factor': 1}

          ;

          2) Create a table as follows:

          use all_production;
          CREATE TABLE position_hints (shard int, date text, when timeuuid, sequence bigint, syd int, broker uuid, engine uuid, confirmed bigint, open_buy bigint, open_sell bigint, PRIMARY KEY ((shard, date), when)) with clustering order by (when desc);

          3) Stop Cassandra. Untar the attached file in /var/lib/cassandra/data/all_production/ to populate the position_hints table.

          4) Start Cassandra.

          5) Prepare a large number of queries as follows:

          for (( i = 0 ; i < 1000000 ; i = i + 1 )) ; do echo "select * from position_hints where shard=1 and date='2013-10-30' and when>ba719c52-4182-11e3-a471-003048feded4 limit 1;" ; done > /tmp/queries

          6) In cqlsh:

          use all_production;
          source '/tmp/queries';

          7) Enjoy watching the number of fd's used by Cassandra go up and up.

          Show
          baldrick Duncan Sands added a comment - OK, here is how you can reproduce. 1) Create this keyspace: CREATE KEYSPACE all_production WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1} ; 2) Create a table as follows: use all_production; CREATE TABLE position_hints (shard int, date text, when timeuuid, sequence bigint, syd int, broker uuid, engine uuid, confirmed bigint, open_buy bigint, open_sell bigint, PRIMARY KEY ((shard, date), when)) with clustering order by (when desc); 3) Stop Cassandra. Untar the attached file in /var/lib/cassandra/data/all_production/ to populate the position_hints table. 4) Start Cassandra. 5) Prepare a large number of queries as follows: for (( i = 0 ; i < 1000000 ; i = i + 1 )) ; do echo "select * from position_hints where shard=1 and date='2013-10-30' and when>ba719c52-4182-11e3-a471-003048feded4 limit 1;" ; done > /tmp/queries 6) In cqlsh: use all_production; source '/tmp/queries'; 7) Enjoy watching the number of fd's used by Cassandra go up and up.
          Hide
          baldrick Duncan Sands added a comment -

          Untar this to populate the position_hints table.

          Show
          baldrick Duncan Sands added a comment - Untar this to populate the position_hints table.
          Hide
          jbellis Jonathan Ellis added a comment -

          Can you reproduce the above on 2.0.2 or 2.0.2 HEAD, Michael Shuler?

          Show
          jbellis Jonathan Ellis added a comment - Can you reproduce the above on 2.0.2 or 2.0.2 HEAD, Michael Shuler ?
          Hide
          mshuler Michael Shuler added a comment -

          Reproduced in 2.0.2.
          On m1.medium, running C*, the open files were about 910. My query is still running, and I'm at >3500 open files.
          (I'll work on cassandra-2.0 branch HEAD, next)

          Show
          mshuler Michael Shuler added a comment - Reproduced in 2.0.2. On m1.medium, running C*, the open files were about 910. My query is still running, and I'm at >3500 open files. (I'll work on cassandra-2.0 branch HEAD, next)
          Hide
          mshuler Michael Shuler added a comment -

          Same results on cassandra-2.0 HEAD.

          Show
          mshuler Michael Shuler added a comment - Same results on cassandra-2.0 HEAD.
          Hide
          mshuler Michael Shuler added a comment - - edited

          c_file-descriptors_strace.tbz is a strace of the C* java process and children while running the query to about 27k open files. This was on the cassandra-2.0 branch in git.

          Show
          mshuler Michael Shuler added a comment - - edited c_file-descriptors_strace.tbz is a strace of the C* java process and children while running the query to about 27k open files. This was on the cassandra-2.0 branch in git.
          Hide
          krummas Marcus Eriksson added a comment -

          ok, this is what i have so far, i can also reproduce on an m1.medium in EC2, ubuntu 13.10 which has 3.11.x kernel.

          i cannot reproduce on my laptop (debian squeeze) or my server (rhel 6), both run kernel 2.6.x. (jdk7u45 on all)

          it happens with trivial tables/data as well, so seems unrelated to TTL or truncate etc

          just starting cassandra up shows ~50 open FDs for the same Data.db-file

          Show
          krummas Marcus Eriksson added a comment - ok, this is what i have so far, i can also reproduce on an m1.medium in EC2, ubuntu 13.10 which has 3.11.x kernel. i cannot reproduce on my laptop (debian squeeze) or my server (rhel 6), both run kernel 2.6.x. (jdk7u45 on all) it happens with trivial tables/data as well, so seems unrelated to TTL or truncate etc just starting cassandra up shows ~50 open FDs for the same Data.db-file
          Hide
          baldrick Duncan Sands added a comment -

          I originally saw the issue on Ubuntu 10.04 (kernel 2.6.32) and reproduced it on Ubuntu 13.10 (kernel 3.11.0).

          Show
          baldrick Duncan Sands added a comment - I originally saw the issue on Ubuntu 10.04 (kernel 2.6.32) and reproduced it on Ubuntu 13.10 (kernel 3.11.0).
          Hide
          pieterc Pieter Callewaert added a comment -

          I also have the problem on Ubuntu 12.04 (Linux de-cass00 3.8.0-30-generic #44~precise1-Ubuntu SMP Fri Aug 23 18:32:41 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)

          Posted something om mailing list because I was not sure if it was a bug... (Here you can find more info, In some cases I hade a deleted file more than 50k times open)
          http://www.mail-archive.com/user@cassandra.apache.org/msg32999.html

          Temporary fix was to raise the nofile limit to 1kk...

          Show
          pieterc Pieter Callewaert added a comment - I also have the problem on Ubuntu 12.04 (Linux de-cass00 3.8.0-30-generic #44~precise1-Ubuntu SMP Fri Aug 23 18:32:41 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux) Posted something om mailing list because I was not sure if it was a bug... (Here you can find more info, In some cases I hade a deleted file more than 50k times open) http://www.mail-archive.com/user@cassandra.apache.org/msg32999.html Temporary fix was to raise the nofile limit to 1kk...
          Hide
          mshuler Michael Shuler added a comment -

          My tests were on Ubuntu precise, same kernel as above, with JVM version 1.7_25.

          Show
          mshuler Michael Shuler added a comment - My tests were on Ubuntu precise, same kernel as above, with JVM version 1.7_25.
          Hide
          jre J. Ryan Earl added a comment - - edited

          We recently ran into this issue after upgrading to OpsCenter-4.0.0, it is quite easy to reproduce:

          1. Install Cassandra-2.0.2
          2. Install OpsCenter-4.0.0 on above cluster.

          I upgraded OpsCenter on Friday, and by Sunday I had reached 1 Million open file handles. I had to kill -9 the Cassandra processes as it wouldn't respond to sockets, DSC20 restart scripts reported successfully killing the processes but in fact did not.

          [root@cassandra2 ~]# lsof -u cassandra|wc -l
          175416
          [root@cassandra2 ~]# lsof -u cassandra|grep -c OpsCenter
          174474
          

          Most of the handles show as "deleted"

          [root@cassandra2 ~]# lsof -u cassandra|grep -c deleted
          174449
          
          Show
          jre J. Ryan Earl added a comment - - edited We recently ran into this issue after upgrading to OpsCenter-4.0.0, it is quite easy to reproduce: Install Cassandra-2.0.2 Install OpsCenter-4.0.0 on above cluster. I upgraded OpsCenter on Friday, and by Sunday I had reached 1 Million open file handles. I had to kill -9 the Cassandra processes as it wouldn't respond to sockets, DSC20 restart scripts reported successfully killing the processes but in fact did not. [root@cassandra2 ~]# lsof -u cassandra|wc -l 175416 [root@cassandra2 ~]# lsof -u cassandra|grep -c OpsCenter 174474 Most of the handles show as "deleted" [root@cassandra2 ~]# lsof -u cassandra|grep -c deleted 174449
          Hide
          graham sanderson graham sanderson added a comment - - edited

          Note also, that most if not all of the deleted files are of the form

          java    14018 cassandra  586r   REG               8,33   8792499       1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db (deleted)
          java    14018 cassandra  587r   REG               8,33  27303760       1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db (deleted)
          java    14018 cassandra  588r   REG               8,33   8792499       1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db (deleted)
          java    14018 cassandra  589r   REG               8,33  27303760       1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db (deleted)
          java    14018 cassandra  590r   REG               8,33  10507214        936 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4657-Data.db (deleted)
          

          We have 7 data disks per node (don't know if this contributes to the problem), and the number of such (open but) deleted files is very ill balanced with 93% on two of the 7 disks (on this particular node)... the distribution of live data file size for OpsCenter/rollups60 is a little uneven with the same data mounts that have more deleted files having more actual live data, but the deleted file counts per mount point vary by several order of magnitudes whereas the data size itself does not.

          Show
          graham sanderson graham sanderson added a comment - - edited Note also, that most if not all of the deleted files are of the form java 14018 cassandra 586r REG 8,33 8792499 1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db (deleted) java 14018 cassandra 587r REG 8,33 27303760 1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db (deleted) java 14018 cassandra 588r REG 8,33 8792499 1251 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4656-Data.db (deleted) java 14018 cassandra 589r REG 8,33 27303760 1254 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4655-Data.db (deleted) java 14018 cassandra 590r REG 8,33 10507214 936 /data/1/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-4657-Data.db (deleted) We have 7 data disks per node (don't know if this contributes to the problem), and the number of such (open but) deleted files is very ill balanced with 93% on two of the 7 disks (on this particular node)... the distribution of live data file size for OpsCenter/rollups60 is a little uneven with the same data mounts that have more deleted files having more actual live data, but the deleted file counts per mount point vary by several order of magnitudes whereas the data size itself does not.
          Hide
          jbellis Jonathan Ellis added a comment -

          Michael Shuler Can you reproduce in 1.2? How about 2.0.0?

          If in the latter but not the former it's going to be a bitch to bisect but I don't have any better ideas.

          Show
          jbellis Jonathan Ellis added a comment - Michael Shuler Can you reproduce in 1.2? How about 2.0.0? If in the latter but not the former it's going to be a bitch to bisect but I don't have any better ideas.
          Hide
          capncrunch4me Capn Crunch added a comment -

          Environment Centos 6.4 ~ Kernel 3.10 (elrepo) ~ Cassandra 2.0.2 ~ Opscenter 4.0.

          Noted that Opscenter extremely exacerbates the issue, as the roll-up CFs cause open files to grow at an incredible pace. Turning Opscenter off causes rate to slow and/or stop. After stopping opscenter, the opscenter open files on all C* nodes never release although the open files dont continue to grow.

          There are 10's of thousands of these open:

          /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted)
          /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted)
          /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted)
          /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted)

          A restart of cassandra and keeping opscenter off will keep file descriptors within a comfortable range.

          Show
          capncrunch4me Capn Crunch added a comment - Environment Centos 6.4 ~ Kernel 3.10 (elrepo) ~ Cassandra 2.0.2 ~ Opscenter 4.0. Noted that Opscenter extremely exacerbates the issue, as the roll-up CFs cause open files to grow at an incredible pace. Turning Opscenter off causes rate to slow and/or stop. After stopping opscenter, the opscenter open files on all C* nodes never release although the open files dont continue to grow. There are 10's of thousands of these open: /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted) /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted) /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted) /data/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-15491-Data.db (deleted) A restart of cassandra and keeping opscenter off will keep file descriptors within a comfortable range.
          Hide
          graham sanderson graham sanderson added a comment - - edited

          Yes I believe we can mitigate the problem in the OpCenter case, however it is a good test bed since it makes the problem easy to spot - note it seems to be worse under high read/write activity on tracked keyspaces/CFs, however that makes sense.

          Note I was poking (somewhat blindly) thru the (2.0.2) code (partly out of interest) looking for what might be leaking these file handles, and I also took a heap dump. I discovered what turned out to be CASSANDRA-6358 which leaks FileDescriptors though their refCounts all seemed to be 0. In any case there weren't enough (total FileDescriptors - in the heap dump) to account for the problem. They were also for mem-mapped files (the ifile in SSTableReader) and none of the leaked deleted file handles were mem-mapped (since they were compressed data files)

          That said CASSANDRA-6358 was pinning the SSTableReaders in memory (since the Runnable was an anonymous inner class), so someone with more knowledge of the code might have a better idea if this might be a problem (other than the memory leak)

          I don't have an environment yet where I can easily build and install code changes, though we could downgrade our system test environment to 2.0.0 to see if we can reproduce the problem there - unsure if we can downgrade to 1.2.X easily given our current testing.

          Note while I was looking at the code I came across CASSANDRA-5555... What caught my eye was the interaction between FileCacheService and RAR.deallocate, but more specifically related to the fact that this change, added a concurrent structure inside another separate concurrent structure, and it seemed like there might be a case where a RAR was recycled into a concurrent queue that was already removed and drained, in which case it would get GCed without close, presumably causing a file handle leak on the native side. Though I couldn't come up with any significantly convincing interactions that would cause this to happen without some very very unlucky things happening (and my knowledge of the google cache implementation was even more limited!), so this is unlikely the cause of this issue (especially if the issue doesn't happen in the 1.2.7+ branch), because I think nearly all deleted data files are being leakd, and finally because there is no particular correlation with TTL.

          Show
          graham sanderson graham sanderson added a comment - - edited Yes I believe we can mitigate the problem in the OpCenter case, however it is a good test bed since it makes the problem easy to spot - note it seems to be worse under high read/write activity on tracked keyspaces/CFs, however that makes sense. Note I was poking (somewhat blindly) thru the (2.0.2) code (partly out of interest) looking for what might be leaking these file handles, and I also took a heap dump. I discovered what turned out to be CASSANDRA-6358 which leaks FileDescriptors though their refCounts all seemed to be 0. In any case there weren't enough (total FileDescriptors - in the heap dump) to account for the problem. They were also for mem-mapped files (the ifile in SSTableReader) and none of the leaked deleted file handles were mem-mapped (since they were compressed data files) That said CASSANDRA-6358 was pinning the SSTableReaders in memory (since the Runnable was an anonymous inner class), so someone with more knowledge of the code might have a better idea if this might be a problem (other than the memory leak) I don't have an environment yet where I can easily build and install code changes, though we could downgrade our system test environment to 2.0.0 to see if we can reproduce the problem there - unsure if we can downgrade to 1.2.X easily given our current testing. Note while I was looking at the code I came across CASSANDRA-5555 ... What caught my eye was the interaction between FileCacheService and RAR.deallocate, but more specifically related to the fact that this change, added a concurrent structure inside another separate concurrent structure, and it seemed like there might be a case where a RAR was recycled into a concurrent queue that was already removed and drained, in which case it would get GCed without close, presumably causing a file handle leak on the native side. Though I couldn't come up with any significantly convincing interactions that would cause this to happen without some very very unlucky things happening (and my knowledge of the google cache implementation was even more limited!), so this is unlikely the cause of this issue (especially if the issue doesn't happen in the 1.2.7+ branch), because I think nearly all deleted data files are being leakd, and finally because there is no particular correlation with TTL.
          Hide
          mishail Mikhail Stepura added a comment -

          What caught my eye was the interaction between FileCacheService and RAR.deallocate, but more specifically related to the fact that this change, added a concurrent structure inside another separate concurrent structure, and it seemed like there might be a case where a RAR was recycled into a concurrent queue that was already removed and drained, in which case it would get GCed without close, presumably causing a file handle leak on the native side

          I might be wrong, but what are you saying correlates with observations from CASSANDRA-6283. Andreas Schnitzerling experiminted and

          found out, that a finalizer fixes the problem. So after GC the files will be deleted (not optimal, but working fine). It runs now 2 days continously without problem. Possible fix/test:I wrote the following finalizer at the end of class org.apache.cassandra.io.util.RandomAccessReader: { deallocate(); super.finalize(); } }

          Show
          mishail Mikhail Stepura added a comment - What caught my eye was the interaction between FileCacheService and RAR.deallocate, but more specifically related to the fact that this change, added a concurrent structure inside another separate concurrent structure, and it seemed like there might be a case where a RAR was recycled into a concurrent queue that was already removed and drained, in which case it would get GCed without close, presumably causing a file handle leak on the native side I might be wrong, but what are you saying correlates with observations from CASSANDRA-6283 . Andreas Schnitzerling experiminted and found out, that a finalizer fixes the problem. So after GC the files will be deleted (not optimal, but working fine). It runs now 2 days continously without problem. Possible fix/test:I wrote the following finalizer at the end of class org.apache.cassandra.io.util.RandomAccessReader: { deallocate(); super.finalize(); } }
          Hide
          mishail Mikhail Stepura added a comment -

          I wonder what would happen with {{file_cache_size_in_mb: 0 }}. RAR should be explicitly deallocated then.

          Show
          mishail Mikhail Stepura added a comment - I wonder what would happen with {{file_cache_size_in_mb: 0 }}. RAR should be explicitly deallocated then.
          Hide
          graham sanderson graham sanderson added a comment -

          Trying that now (one node with that setting)

          Show
          graham sanderson graham sanderson added a comment - Trying that now (one node with that setting)
          Hide
          jre J. Ryan Earl added a comment - - edited

          Mikhail Stepura We (Graham Sanderson and I work together) added 'file_cache_size_in_mb: 0' to cassandra.yaml on one of the nodes, and restarted that node plus another with the default (unspecified) file_cache_size_in_mb setting to run an A/B test. Both nodes still leak file handles, however, the node with the default setting leaks much faster (about 3-4x the leak rate).

          CASSANDRA-6283 appears to be an exact duplicate of this problem, Windows and Linux JVMs appear to exhibit the exact same file handle leak behavior.

          Show
          jre J. Ryan Earl added a comment - - edited Mikhail Stepura We (Graham Sanderson and I work together) added 'file_cache_size_in_mb: 0' to cassandra.yaml on one of the nodes, and restarted that node plus another with the default (unspecified) file_cache_size_in_mb setting to run an A/B test. Both nodes still leak file handles, however, the node with the default setting leaks much faster (about 3-4x the leak rate). CASSANDRA-6283 appears to be an exact duplicate of this problem, Windows and Linux JVMs appear to exhibit the exact same file handle leak behavior.
          Hide
          graham sanderson graham sanderson added a comment - - edited

          Note that this would tend to imply that I was wrong (at least about the particular code path), and the change in leak rate may be attributable to less throughput without the file cache. Note the leak rate does seem quite related to how hard we are hitting the server as mentioned before, so a threading bug elsewhere might be the cause.

          Note nominally buffer in RAR should be volatile, but then any code path thru close where buffer's latest value is stale would end up calling deallocate anyway (at least in the case that file_cache_size_in_mb is off; I didn't think through the other case.

          So given the finalizer fix - which we can try and build here to test out (unless someone has it pre-built) - seems to imply that it is just someone failing to call close() under load conditions.

          Show
          graham sanderson graham sanderson added a comment - - edited Note that this would tend to imply that I was wrong (at least about the particular code path), and the change in leak rate may be attributable to less throughput without the file cache. Note the leak rate does seem quite related to how hard we are hitting the server as mentioned before, so a threading bug elsewhere might be the cause. Note nominally buffer in RAR should be volatile, but then any code path thru close where buffer's latest value is stale would end up calling deallocate anyway (at least in the case that file_cache_size_in_mb is off; I didn't think through the other case. So given the finalizer fix - which we can try and build here to test out (unless someone has it pre-built) - seems to imply that it is just someone failing to call close() under load conditions.
          Hide
          graham sanderson graham sanderson added a comment -

          We were able to confirm the finalizer fix stopped the leak

          Show
          graham sanderson graham sanderson added a comment - We were able to confirm the finalizer fix stopped the leak
          Hide
          graham sanderson graham sanderson added a comment -

          Note I believe the problem is caused by CASSANDRA-5514 (2.0 beta 1)

          I don't have a patch because I don't know the exact patch for reasons below

          Here is CollationController.java starting at line 264: (as of current 2.0 branch and 2.0.2)

                      // Check for row tombstone in the skipped sstables
                      if (skippedSSTables != null)
                      {
                          for (SSTableReader sstable : skippedSSTables)
                          {
                              if (sstable.getMaxTimestamp() <= minTimestamp)
                                  continue;
          
                              sstable.incrementReadCount();
                              OnDiskAtomIterator iter = filter.getSSTableColumnIterator(sstable);
                              if (iter.getColumnFamily() == null)
                                  continue;
          
                              ColumnFamily cf = iter.getColumnFamily();
                              // we are only interested in row-level tombstones here, and only if markedForDeleteAt is larger than minTimestamp
                              if (cf.deletionInfo().getTopLevelDeletion().markedForDeleteAt > minTimestamp)
                              {
                                  includedDueToTombstones++;
                                  iterators.add(iter);
                                  returnCF.delete(cf.deletionInfo().getTopLevelDeletion());
                                  sstablesIterated++;
                              }
                          }
                      }
          

          Note if the last "if" test does not succeed, then "iter" is neither closed, nor is it added to the "iterators" list to be closed in the finally section at the end - it would have been easy for me to add it always to "iterators" list except that "iterators" is referenced lower in the function:

                      if (iterators.isEmpty())
                          return null;
          
                      Tracing.trace("Merging data from memtables and {} sstables", sstablesIterated);
                      filter.collateOnDiskAtom(returnCF, iterators, gcBefore);
          

          Being new to the code, I cannot say whether it should be in "iterators" at that point, or just have been closed (quietly) above

          Show
          graham sanderson graham sanderson added a comment - Note I believe the problem is caused by CASSANDRA-5514 (2.0 beta 1) I don't have a patch because I don't know the exact patch for reasons below Here is CollationController.java starting at line 264: (as of current 2.0 branch and 2.0.2) // Check for row tombstone in the skipped sstables if (skippedSSTables != null ) { for (SSTableReader sstable : skippedSSTables) { if (sstable.getMaxTimestamp() <= minTimestamp) continue ; sstable.incrementReadCount(); OnDiskAtomIterator iter = filter.getSSTableColumnIterator(sstable); if (iter.getColumnFamily() == null ) continue ; ColumnFamily cf = iter.getColumnFamily(); // we are only interested in row-level tombstones here, and only if markedForDeleteAt is larger than minTimestamp if (cf.deletionInfo().getTopLevelDeletion().markedForDeleteAt > minTimestamp) { includedDueToTombstones++; iterators.add(iter); returnCF.delete(cf.deletionInfo().getTopLevelDeletion()); sstablesIterated++; } } } Note if the last "if" test does not succeed, then "iter" is neither closed, nor is it added to the "iterators" list to be closed in the finally section at the end - it would have been easy for me to add it always to "iterators" list except that "iterators" is referenced lower in the function: if (iterators.isEmpty()) return null ; Tracing.trace( "Merging data from memtables and {} sstables" , sstablesIterated); filter.collateOnDiskAtom(returnCF, iterators, gcBefore); Being new to the code, I cannot say whether it should be in "iterators" at that point, or just have been closed (quietly) above
          Hide
          graham sanderson graham sanderson added a comment -

          Also note stack trace for all leaked files we saw - someone can perhaps use this to help figure out what this actually affects (i.e. some of the iter's RARs may have been owned by someone else in which case AOK)

          ERROR [Finalizer] 2013-11-20 03:43:42,129 RandomAccessReader.java (line 399) LEAK finalizer had to clean up
          java.lang.Exception: RAR for /data/5/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-6882-Data.db allocated
          	at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:66)
          	at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:76)
          	at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43)
          	at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
          	at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
          	at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1182)
          	at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:108)
          	at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:84)
          	at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65)
          	at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42)
          	at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167)
          	at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62)
          	at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:273)
          	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
          	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1467)
          	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1286)
          	at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:332)
          	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
          	at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
          	at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
          	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
          	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
          	at java.lang.Thread.run(Thread.java:744)
          
          Show
          graham sanderson graham sanderson added a comment - Also note stack trace for all leaked files we saw - someone can perhaps use this to help figure out what this actually affects (i.e. some of the iter's RARs may have been owned by someone else in which case AOK) ERROR [Finalizer] 2013-11-20 03:43:42,129 RandomAccessReader.java (line 399) LEAK finalizer had to clean up java.lang.Exception: RAR for /data/5/cassandra/OpsCenter/rollups60/OpsCenter-rollups60-jb-6882-Data.db allocated at org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:66) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.<init>(CompressedRandomAccessReader.java:76) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:43) at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48) at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1182) at org.apache.cassandra.db.columniterator.IndexedSliceReader.setToRowStart(IndexedSliceReader.java:108) at org.apache.cassandra.db.columniterator.IndexedSliceReader.<init>(IndexedSliceReader.java:84) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:65) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:42) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:167) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:62) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:273) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1467) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1286) at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:332) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang. Thread .run( Thread .java:744)
          Hide
          jbellis Jonathan Ellis added a comment -

          Nailed it. (It should be closed.) Patch attached.

          Show
          jbellis Jonathan Ellis added a comment - Nailed it. (It should be closed.) Patch attached.
          Hide
          jre J. Ryan Earl added a comment -

          I accidentally hit the "Testing" button, and don't see a way to revert. I've build cassandra-2.0.2 with just this patch applied, and we are testing it now, but I didn't mean to change the status.

          Show
          jre J. Ryan Earl added a comment - I accidentally hit the "Testing" button, and don't see a way to revert. I've build cassandra-2.0.2 with just this patch applied, and we are testing it now, but I didn't mean to change the status.
          Hide
          jbellis Jonathan Ellis added a comment -

          (Reject takes it back to Open.)

          Show
          jbellis Jonathan Ellis added a comment - (Reject takes it back to Open.)
          Hide
          mshuler Michael Shuler added a comment -

          I will test this out this morning!

          Show
          mshuler Michael Shuler added a comment - I will test this out this morning!
          Hide
          jre J. Ryan Earl added a comment -

          So we've been running this all night, have written a few hundred GB of data with some products we're developing, all the while OpsCenter 4.0.0 was doing TTL'd rollups and what not. Deleted file count remained at 1 the entire time, never increasing, and total file count remained below 1000.

          FYI, the single undeleted file looks like some temporary file randomly generated on Cassandra startup that gets deleted but not closed for the processes' duration, example:

          java    1925 cassandra   44u   REG              253,4      4096         13 /tmp/ffi441Hpl (deleted)
          
          Show
          jre J. Ryan Earl added a comment - So we've been running this all night, have written a few hundred GB of data with some products we're developing, all the while OpsCenter 4.0.0 was doing TTL'd rollups and what not. Deleted file count remained at 1 the entire time, never increasing, and total file count remained below 1000. FYI, the single undeleted file looks like some temporary file randomly generated on Cassandra startup that gets deleted but not closed for the processes' duration, example: java 1925 cassandra 44u REG 253,4 4096 13 /tmp/ffi441Hpl (deleted)
          Hide
          brandon.williams Brandon Williams added a comment -

          That's probably JNA, or snappy.

          Show
          brandon.williams Brandon Williams added a comment - That's probably JNA, or snappy.
          Hide
          slebresne Sylvain Lebresne added a comment -

          That patch lgtm, +1.

          Show
          slebresne Sylvain Lebresne added a comment - That patch lgtm, +1.
          Hide
          mshuler Michael Shuler added a comment -

          Patch works for me, too

          Show
          mshuler Michael Shuler added a comment - Patch works for me, too
          Hide
          jbellis Jonathan Ellis added a comment -

          committed. thanks everyone!

          Show
          jbellis Jonathan Ellis added a comment - committed. thanks everyone!

            People

            • Assignee:
              graham.sanderson graham sanderson
              Reporter:
              ash2k Mikhail Mazurskiy
              Reviewer:
              Jonathan Ellis
            • Votes:
              6 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development