Cassandra
  1. Cassandra
  2. CASSANDRA-2845

Cassandra uses 100% system CPU on Ubuntu Natty (11.04)

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Not a Problem
    • Fix Version/s: 0.8.2
    • Component/s: Core
    • Labels:
      None
    • Environment:

      Default install of Ubuntu 11.04

      Description

      Step 1. Boot up a brand new, default Ubuntu 11.04 Server install
      Step 2. Install Cassandra from Apache APT Respository (deb http://www.apache.org/dist/cassandra/debian 08x main)
      Step 3. apt-get install cassandra, as soon as it cassandra starts it will freeze the machine

      What's happening is that as soon as cassandra starts up it immediately sucks up 100% of CPU and starves the machine. This effectively bricks the box until you boot into single user mode and disable the cassandra init.d script.

      Under htop, the CPU usage shows up as "system" cpu, not user.

      The machine I'm testing this on is a Quad-Core Sandy Bridge w/ 16GB of Memory, so it's not a system resource issue. I've also tested this on completely different hardware (Dual 64-Bit Xeons & AMD X4) and it has the same effect.

      Ubuntu 10.10 does not exhibit the same issue. I have only tested 0.8 and 0.8.1.

      root@cassandra01:/# java -version
      java version "1.6.0_22"
      OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)
      OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)

      root@cassandra:/# uname -a
      Linux cassandra01 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

      /proc/cpu
      Intel(R) Xeon(R) CPU E31270 @ 3.40GHz

      /proc/meminfo
      MemTotal: 16459776 kB
      MemFree: 14190708 kB

        Activity

        Hide
        tyler cheung added a comment - - edited

        hmm... when i start it w/ sudo /usr/sbin/cassandra, instead of the /etc/init.d/cassandra script, the memory usage is fine...

        Tasks: 156 total, 1 running, 154 sleeping, 0 stopped, 1 zombie
        Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
        Mem: 3934168k total, 2511668k used, 1422500k free, 176076k buffers
        Swap: 4071420k total, 25572k used, 4045848k free, 1086980k cached

        PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
        24815 root 20 0 1527m 69m 9.9m S 1 1.8 0:00.99 java

        Am I missing something very basic? My limited understanding is that when cassandra is run under a jsvc wrapper things, memory usage seems to go up, at least what top is telling me....

        I guess perhaps this is a separate issue (memory usage under jsvc) than the cpu usage issue being reported here...

        Show
        tyler cheung added a comment - - edited hmm... when i start it w/ sudo /usr/sbin/cassandra, instead of the /etc/init.d/cassandra script, the memory usage is fine... Tasks: 156 total, 1 running, 154 sleeping, 0 stopped, 1 zombie Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3934168k total, 2511668k used, 1422500k free, 176076k buffers Swap: 4071420k total, 25572k used, 4045848k free, 1086980k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24815 root 20 0 1527m 69m 9.9m S 1 1.8 0:00.99 java Am I missing something very basic? My limited understanding is that when cassandra is run under a jsvc wrapper things, memory usage seems to go up, at least what top is telling me.... I guess perhaps this is a separate issue (memory usage under jsvc) than the cpu usage issue being reported here...
        Hide
        tyler cheung added a comment - - edited

        OK - so what happened was I manually put in libjna 3.4.0 files (copied the tarball and redid the symlinks). The /etc/init.d script didn't work but the sudo /usr/sbin/cassandra did, and things seem to work a lot better.

        They just moved my office and I have a new core i5 w/ 4 gb RAM. I just redid the cassandra install w/ vanilla 11.10 (libjna 3.2.7), openjre6, jsvc1.0.6.

        There was a brief hiccup but its running. Top shows jsvc running cassandra consuming 35% memory as follows:

        Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
        Mem: 3934168k total, 3805184k used, 128984k free, 175792k buffers
        Swap: 4071420k total, 26532k used, 4044888k free, 1083252k cached

        PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
        20452 root 20 0 166m 13m 8640 S 1 0.4 1:37.09 Xorg
        21906 tylerc 20 0 453m 16m 10m S 1 0.4 0:02.43 xfce4-terminal
        21337 tylerc 20 0 172m 12m 8704 S 0 0.3 0:10.59 xfwm4
        21433 tylerc 20 0 698m 105m 38m S 0 2.7 2:59.83 chrome
        23067 tylerc 20 0 864m 59m 16m S 0 1.6 0:01.94 chrome
        23735 cassandr 20 0 1518m 1.3g 18m S 0 35.1 0:01.90 jsvc
        ...

        I'm going to try and get the newer libjna again and see if this helps memory usage. Kernel is 3.0.0.12 generic.

        Show
        tyler cheung added a comment - - edited OK - so what happened was I manually put in libjna 3.4.0 files (copied the tarball and redid the symlinks). The /etc/init.d script didn't work but the sudo /usr/sbin/cassandra did, and things seem to work a lot better. They just moved my office and I have a new core i5 w/ 4 gb RAM. I just redid the cassandra install w/ vanilla 11.10 (libjna 3.2.7), openjre6, jsvc1.0.6. There was a brief hiccup but its running. Top shows jsvc running cassandra consuming 35% memory as follows: Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3934168k total, 3805184k used, 128984k free, 175792k buffers Swap: 4071420k total, 26532k used, 4044888k free, 1083252k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20452 root 20 0 166m 13m 8640 S 1 0.4 1:37.09 Xorg 21906 tylerc 20 0 453m 16m 10m S 1 0.4 0:02.43 xfce4-terminal 21337 tylerc 20 0 172m 12m 8704 S 0 0.3 0:10.59 xfwm4 21433 tylerc 20 0 698m 105m 38m S 0 2.7 2:59.83 chrome 23067 tylerc 20 0 864m 59m 16m S 0 1.6 0:01.94 chrome 23735 cassandr 20 0 1518m 1.3g 18m S 0 35.1 0:01.90 jsvc ... I'm going to try and get the newer libjna again and see if this helps memory usage. Kernel is 3.0.0.12 generic.
        Hide
        Steve Corona added a comment -

        The kernel upgrade is what fixed it for us.

        Show
        Steve Corona added a comment - The kernel upgrade is what fixed it for us.
        Hide
        paul cannon added a comment -

        tyler- would it be possible to try with a 2.6.38-10 kernel, and/or try by running cassandra directly instead of with the initscript (which uses jsvc)? Also, which JVM are you using?

        Show
        paul cannon added a comment - tyler- would it be possible to try with a 2.6.38-10 kernel, and/or try by running cassandra directly instead of with the initscript (which uses jsvc)? Also, which JVM are you using?
        Hide
        tyler cheung added a comment -

        I'm not sure if things have changed or maybe the machine I'm running this on is old, but its happening for me. Kernel is 3.0.0.16.19, ubuntu 11.10. This is on a late model pentium4 w/ ~1 gb memory. libjna is at 3.2.7 (whatever the ubuntu distro has). I'm trying to figure out how to update that to 3.4.0, but not quite sure how to do that yet.

        Show
        tyler cheung added a comment - I'm not sure if things have changed or maybe the machine I'm running this on is old, but its happening for me. Kernel is 3.0.0.16.19, ubuntu 11.10. This is on a late model pentium4 w/ ~1 gb memory. libjna is at 3.2.7 (whatever the ubuntu distro has). I'm trying to figure out how to update that to 3.4.0, but not quite sure how to do that yet.
        Hide
        Jackson Chung added a comment -

        fwiw,

        i was able to avoid this (hang) if using just java (Sun's) instead of jsvc. (jna enabled on both, i do have to symlink it manually when start manually if install from deb package)

        Once i switch to jsvc, hell breaks

        kernel was on the older one on ec2:

        Linux domU-12-31-39-00-2C-42 2.6.38-8-virtual #42-Ubuntu SMP Mon Apr 11 04:06:34 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

        Also able to had the same hang on a 2.6.35 on a rackspace couple days ago (killed the vm already..)

        dmesg shows timeout/OOM, crazy stuff

        on my own local,with separate install kernel

        Distributor ID: Ubuntu
        Description: Ubuntu 10.10
        Release: 10.10
        Codename: maverick
        Linux faranth 2.6.39-02063903-generic #201107091121 SMP Sat Jul 9 11:25:36 UTC 2011 x86_64 GNU/Linux

        I don't have the hang problem (using jsvc/jna/package)

        Show
        Jackson Chung added a comment - fwiw, i was able to avoid this (hang) if using just java (Sun's) instead of jsvc. (jna enabled on both, i do have to symlink it manually when start manually if install from deb package) Once i switch to jsvc, hell breaks kernel was on the older one on ec2: Linux domU-12-31-39-00-2C-42 2.6.38-8-virtual #42-Ubuntu SMP Mon Apr 11 04:06:34 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Also able to had the same hang on a 2.6.35 on a rackspace couple days ago (killed the vm already..) dmesg shows timeout/OOM, crazy stuff on my own local,with separate install kernel Distributor ID: Ubuntu Description: Ubuntu 10.10 Release: 10.10 Codename: maverick Linux faranth 2.6.39-02063903-generic #201107091121 SMP Sat Jul 9 11:25:36 UTC 2011 x86_64 GNU/Linux I don't have the hang problem (using jsvc/jna/package)
        Hide
        paul cannon added a comment -

        Oh ok, thanks.

        Show
        paul cannon added a comment - Oh ok, thanks.
        Hide
        Steve Corona added a comment - - edited

        FWIW, I am using real hardware and not an EC2 instance. i.e, the problem is not localized to ec2 only, it's all ubuntu 2.6.38-8 kernels

        Show
        Steve Corona added a comment - - edited FWIW, I am using real hardware and not an EC2 instance. i.e, the problem is not localized to ec2 only, it's all ubuntu 2.6.38-8 kernels
        Hide
        paul cannon added a comment -

        Ok, documented things at http://wiki.apache.org/cassandra/FAQ#ubuntu_ec2_hangs . Closing this for now.

        Show
        paul cannon added a comment - Ok, documented things at http://wiki.apache.org/cassandra/FAQ#ubuntu_ec2_hangs . Closing this for now.
        Hide
        Steve Corona added a comment -

        brilliant- upgrading from 2.6.38-8 to 2.6.38-10 solves the issue for me as well. so strange

        Show
        Steve Corona added a comment - brilliant- upgrading from 2.6.38-8 to 2.6.38-10 solves the issue for me as well. so strange
        Hide
        paul cannon added a comment -

        To clarify my earlier comment about different EC2 kernel images: I thought those were the actual linux kernel images I was getting, but they were actually just different pv-grub builds. They were all still chaining to the kernel from inside the Ubuntu image.

        Show
        paul cannon added a comment - To clarify my earlier comment about different EC2 kernel images: I thought those were the actual linux kernel images I was getting, but they were actually just different pv-grub builds. They were all still chaining to the kernel from inside the Ubuntu image.
        Hide
        paul cannon added a comment -

        It seems worth just a tiny bit of extra testing if we can identify the problem a little better, because the potential benefit from JNA is so large, and Ubuntu is, by and large, a very handy platform to use with ec2 (not least because of the UEC AMIs).

        I've found now that upgrading the kernel to 2.6.38-10 (in my case, the package linux-image-2.6.38-10-virtual) seems to make the problem go away completely. Steve, can you see if that helps you as well? If so, we can close this and just recommend avoiding Ubuntu 2.6.38-8 kernel builds.

        Show
        paul cannon added a comment - It seems worth just a tiny bit of extra testing if we can identify the problem a little better, because the potential benefit from JNA is so large, and Ubuntu is, by and large, a very handy platform to use with ec2 (not least because of the UEC AMIs). I've found now that upgrading the kernel to 2.6.38-10 (in my case, the package linux-image-2.6.38-10-virtual) seems to make the problem go away completely. Steve, can you see if that helps you as well? If so, we can close this and just recommend avoiding Ubuntu 2.6.38-8 kernel builds.
        Hide
        Jonathan Ellis added a comment -

        should we report a bug, mark jna as don't-install-by-default, recommend avoiding natty, and move on?

        Show
        Jonathan Ellis added a comment - should we report a bug, mark jna as don't-install-by-default, recommend avoiding natty, and move on?
        Hide
        paul cannon added a comment -

        Ok, I believe I've reproduced this. I don't get 100% cpu usage, but I do see strange stalls and kernel errors in dmesg about tasks being hung for too long, or in some cases, the machine becomes completely unresponsive. Simple things like "ps aux" hang on reading certain /proc/$pid/cmdline entries, effects like that.

        What I've found so far:

        • Can reproduce on Ubuntu 11.04 (natty), haven't been able to reproduce on Ubuntu 10.10 (maverick)
        • Can reproduce with several different libjna-java/jna.jar builds. Can not reproduce without any JNA available.
        • Can not reproduce when CLibrary.tryMlockall() call to mlockall() is commented out.
        • Can still reproduce when memlock resource-limit is severely restricted (i.e. set to 1 kb in /etc/security/limits.d/cassandra.conf)
        • Can reproduce with a couple different EC2 kernel images. Haven't found a kernel yet that makes it unreproducible.

        At this point I'm suspecting some sort of funkiness in the natty glibc. Will be testing more.

        Oh, and as an aside- aptitude also doesn't force you to install recommends; just use aptitude install --without-recommends.

        Show
        paul cannon added a comment - Ok, I believe I've reproduced this. I don't get 100% cpu usage, but I do see strange stalls and kernel errors in dmesg about tasks being hung for too long, or in some cases, the machine becomes completely unresponsive. Simple things like "ps aux" hang on reading certain /proc/$pid/cmdline entries, effects like that. What I've found so far: Can reproduce on Ubuntu 11.04 (natty), haven't been able to reproduce on Ubuntu 10.10 (maverick) Can reproduce with several different libjna-java/jna.jar builds. Can not reproduce without any JNA available. Can not reproduce when CLibrary.tryMlockall() call to mlockall() is commented out. Can still reproduce when memlock resource-limit is severely restricted (i.e. set to 1 kb in /etc/security/limits.d/cassandra.conf) Can reproduce with a couple different EC2 kernel images. Haven't found a kernel yet that makes it unreproducible. At this point I'm suspecting some sort of funkiness in the natty glibc. Will be testing more. Oh, and as an aside- aptitude also doesn't force you to install recommends; just use aptitude install --without-recommends .
        Hide
        Steve Corona added a comment -

        Okay, so as it turns out the original problem is different than I thought. My dpkg solution was just skirting around the real issue (since dpkg doesn't force you to install all of the recommended dependencies).

        It's libjna-java (3.2.4-2ubuntu2) that's really causing the issue. The cassandra apt repository is pulling it in as a dependency and for, whatever reason, it sucks up all of the CPU when it runs with cassandra. I don't know if it's a matter of libjna being broken in 11.04 or just that it doesn't play nice with Cassandra.

        FWIW, CASSANDRA-2803 mentions deb packages & libjna- not sure what role that plays into this.

        Here is my current workaround:

        mkdir -p /usr/sbin/
        cat < /usr/sbin/policy-rc.d
        #!/bin/sh
        exit 101
        EOF
        chmod 755 /usr/sbin/policy-rc.d

        apt-get install cassandra
        apt-get remove libjna-java
        service cassandra start

        Show
        Steve Corona added a comment - Okay, so as it turns out the original problem is different than I thought. My dpkg solution was just skirting around the real issue (since dpkg doesn't force you to install all of the recommended dependencies). It's libjna-java (3.2.4-2ubuntu2) that's really causing the issue. The cassandra apt repository is pulling it in as a dependency and for, whatever reason, it sucks up all of the CPU when it runs with cassandra. I don't know if it's a matter of libjna being broken in 11.04 or just that it doesn't play nice with Cassandra. FWIW, CASSANDRA-2803 mentions deb packages & libjna- not sure what role that plays into this. Here is my current workaround: mkdir -p /usr/sbin/ cat < /usr/sbin/policy-rc.d #!/bin/sh exit 101 EOF chmod 755 /usr/sbin/policy-rc.d apt-get install cassandra apt-get remove libjna-java service cassandra start
        Hide
        Jonathan Ellis added a comment -

        /baffled

        Show
        Jonathan Ellis added a comment - /baffled
        Hide
        Steve Corona added a comment -

        I actually figured this out- it's more of a cassandra packaging issue than an issue with the actual code.

        I extracted the cassandra-0.8.1.deb file and diff'ed all of the files with apache-cassandra-0.8.1-bin.tar.gz. I noticed that apache-cassandra-0.8.1.jar was off by a few bytes. I extracted the jar and determined that the deb file was using a different version of the following classes:

        cli/CliLexer.class
        cli/CliParser.class
        cql/CqlLexer.class
        cql/CqlParser.class

        I repackaged the .deb using apache-cassandra-0.8.1.jar from the bin.tar.gz (will post instructions below) and it installed on Ubuntu 11.04 without a hitch. I'm not sure if the .jar/.class files used to package the deb were corrupted or just are a different/incomplete/broken version.

        Poor mans .deb repackaging until it's officially fixed:

        cd /tmp
        mkdir work && cd work
        wget http://www.fightrice.com/mirrors/apache/cassandra/0.8.1/apache-cassandra-0.8.1-bin.tar.gz
        tar -zxvf apache-cassandra-0.8.1-bin.tar.gz

        mkdir deb && cd deb
        wget http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.1_all.deb

        1. need bintools to get ar utility
          sudo apt-get install binutils

        ar vx cassandra_0.8.1_all.deb
        tar -zxvf data.tar.gz
        rm data.tar.gz
        cd ./usr/share/cassandra

        mv /tmp/work/apache-cassandra-0.8.1/lib/apache-cassandra-0.8.1.jar .
        cd /tmp/work/deb
        tar -czvf data.tar.gz etc/ usr/ var/

        rm cassandra_0.8.1_all.deb
        ar rc cassandra_0.8.1_all.deb debian-binary control.tar.gz data.tar.gz

        sudo apt-get install openjdk-6-jdk
        sudo dpkg -i cassandra_0.8.1_all.deb

        Alternatively, you can use policy-rc.d to prevent cassandra.deb's post-init script from running on install and replace the messed up .jar after it has been installed. Instructions here: http://lifeonubuntu.com/how-to-prevent-server-daemons-from-starting-during-apt-get-install/

        Show
        Steve Corona added a comment - I actually figured this out- it's more of a cassandra packaging issue than an issue with the actual code. I extracted the cassandra-0.8.1.deb file and diff'ed all of the files with apache-cassandra-0.8.1-bin.tar.gz. I noticed that apache-cassandra-0.8.1.jar was off by a few bytes. I extracted the jar and determined that the deb file was using a different version of the following classes: cli/CliLexer.class cli/CliParser.class cql/CqlLexer.class cql/CqlParser.class I repackaged the .deb using apache-cassandra-0.8.1.jar from the bin.tar.gz (will post instructions below) and it installed on Ubuntu 11.04 without a hitch. I'm not sure if the .jar/.class files used to package the deb were corrupted or just are a different/incomplete/broken version. Poor mans .deb repackaging until it's officially fixed: cd /tmp mkdir work && cd work wget http://www.fightrice.com/mirrors/apache/cassandra/0.8.1/apache-cassandra-0.8.1-bin.tar.gz tar -zxvf apache-cassandra-0.8.1-bin.tar.gz mkdir deb && cd deb wget http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_0.8.1_all.deb need bintools to get ar utility sudo apt-get install binutils ar vx cassandra_0.8.1_all.deb tar -zxvf data.tar.gz rm data.tar.gz cd ./usr/share/cassandra mv /tmp/work/apache-cassandra-0.8.1/lib/apache-cassandra-0.8.1.jar . cd /tmp/work/deb tar -czvf data.tar.gz etc/ usr/ var/ rm cassandra_0.8.1_all.deb ar rc cassandra_0.8.1_all.deb debian-binary control.tar.gz data.tar.gz sudo apt-get install openjdk-6-jdk sudo dpkg -i cassandra_0.8.1_all.deb Alternatively, you can use policy-rc.d to prevent cassandra.deb's post-init script from running on install and replace the messed up .jar after it has been installed. Instructions here: http://lifeonubuntu.com/how-to-prevent-server-daemons-from-starting-during-apt-get-install/

          People

          • Assignee:
            paul cannon
            Reporter:
            Steve Corona
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development