Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13663

Cassandra 3.10 crashes without dump

    XMLWordPrintableJSON

Details

    • Low

    Description

      Hello. My company runs a 5 node Cassandra cluster. For the last few weeks, we have had a sporadic issue where one of the servers crashes without creating a dump file and without any error messages in the logs. If one restarts the service (which we have by now scripted to happen automatically), the servers resumes work with no complaint.

      Log files of the time of the last crash are attached, thou again they do not log any crash happening.

      Regarding out setup, we are running these servers on AMazon AWS, with 3 volumes per server, one for the system, one for data and one for the commitlog. When a crash happens, we can observe a sudden spike of read activity on the commitlog volume. All of these have ample free space. Aspecially the system volume has more then enough free space so that a dump could be written.

      The servers are Ubuntu 16.04 servers and Cassandra is installed from the apt-get packet for version 3.10.

      It is worth noting that these crashes happen more often when nodetool is running either repair job or a backup job, but this is by no means always the case. As for frequency, we have had about 1-2 crashes per week for the last month.

      Attachments

        1. 2017-07-04 10_48_34-CloudWatch Management Console.png
          30 kB
          Matthias Otto
        2. cassandra debug.log
          5 kB
          Matthias Otto
        3. cassandra system.log
          5 kB
          Matthias Otto
        4. RamUsageExamle1.png
          21 kB
          Matthias Otto
        5. RamUsageExample2.png
          19 kB
          Matthias Otto

        Activity

          People

            Unassigned Unassigned
            MattOtt Matthias Otto
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: