Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-1495

Server doesn't seem to close SSTables correctly, and ends up with too many file descriptors open

    XMLWordPrintableJSON

Details

    • Normal

    Description

      Cassandra server accumulates many open file descriptors as the time progresses. Obviously when it hits whatever limit the system allows, the service stops accepting messages.

      The exact trace is:
      WARN 05:41:47,929 Transport error occurred during acceptance of message.
      org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files
      at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:124)
      at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:35)
      at org.apache.thrift.transport.TServerTransport.accept(TServerTransport.java:31)
      at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve(CustomTThreadPoolServer.java:98)
      at org.apache.cassandra.thrift.CassandraDaemon.start(CassandraDaemon.java:210)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:177)
      Caused by: java.net.SocketException: Too many open files
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
      at java.net.ServerSocket.implAccept(ServerSocket.java:453)
      at java.net.ServerSocket.accept(ServerSocket.java:421)
      at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.java:119)
      ... 9 more

      I've increased the ulimit to 8K from the standard 1K ... but still gets hit. So it seems that basically there are many fd's that never get closed.
      lsof shows that there are many fd's hanging on to SSTables...albeit it's only a small subset of unique files. In my case there were only about 4 distinct SSTable files...but kept opened hundreds of time each.
      All of these files seem to be "Data" files....no Filter or Index ones (as far as I remember)

      In case it matters: DiskAccessMode 'auto' determined to be standard, indexAccessMode is standard

      The service doesn't have a high rate of writes at the moment...I have 3 nodes, 1 being receiving mostly the thrift entry point. I see the problem being more acute on the 2 nodes that instead of being contacted by the clients, they just get the writes from the coordinator...(I have RF=2).

      Attachments

        Activity

          People

            brandon.williams Brandon Williams
            blanquer Josep M. Blanquer
            Brandon Williams
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: