Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.6.3
    • Component/s: Core
    • Labels:
      None

      Description

      I suggested this in a ML thread but it seems that nobody actually tried it. I think it's worth following up on:

      You could try setting the compaction thread to a lower priority. You could add a thread priority to NamedThreadPool, and pass that up from CompactionExecutor constructor. According to http://www.javamex.com/tutorials/threads/priority_what.shtml you have to run as root and add a JVM option to get this to work.

      In particular, Brandon saw stress.py read latencies spike to 100ms during [anti]compaction on a 2 core machine. I'd like to see if this can mitigate that.

      1. 1181.txt
        3 kB
        Jonathan Ellis
      2. 1181-trunk.txt
        4 kB
        Brandon Williams
      3. CompactionManager.java
        24 kB
        Edward Capriolo
      4. stats.txt
        4 kB
        Edward Capriolo

        Activity

        Hide
        Edward Capriolo added a comment -

        How about using some information from tpstats to make this adaptive. See sample code?

        Show
        Edward Capriolo added a comment - How about using some information from tpstats to make this adaptive. See sample code?
        Hide
        Jonathan Ellis added a comment -

        using thread priority makes that unnecessary. that is why it's a better solution.

        Show
        Jonathan Ellis added a comment - using thread priority makes that unnecessary. that is why it's a better solution.
        Hide
        Edward Capriolo added a comment -

        I think ThreadPriority will help, however CPU Priority is not directly linked to IO scheduling. I believe between the the statistics cassandra keeps we could accomplish something adaptive. Also if we want a solution that is not 100% java pure we can read /proc/diskstats and use actual disk utilization to be adaptive.

        All these methods I suggested fall apart if non-compaction traffic is thrashing the disk, but if that were the case the node was not healthy in the first place.

        Show
        Edward Capriolo added a comment - I think ThreadPriority will help, however CPU Priority is not directly linked to IO scheduling. I believe between the the statistics cassandra keeps we could accomplish something adaptive. Also if we want a solution that is not 100% java pure we can read /proc/diskstats and use actual disk utilization to be adaptive. All these methods I suggested fall apart if non-compaction traffic is thrashing the disk, but if that were the case the node was not healthy in the first place.
        Hide
        Jonathan Ellis added a comment - - edited

        to enable, add to cassandra.in.sh:

        -XX:+UseThreadPriorities \
        -XX:ThreadPriorityPolicy=42 \
        -Dcassandra.compaction.priority=1 \

        (TPP=42 is a workaround to allow us to lower thread priority without root privileges, explained at http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html)

        Show
        Jonathan Ellis added a comment - - edited to enable, add to cassandra.in.sh: -XX:+UseThreadPriorities \ -XX:ThreadPriorityPolicy=42 \ -Dcassandra.compaction.priority=1 \ (TPP=42 is a workaround to allow us to lower thread priority without root privileges, explained at http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html )
        Hide
        Brandon Williams added a comment - - edited

        There is some weirdness here. With the JVM options added, I see the compaction thread at niceness -5 as I would expect, however many other threads (like Concurrent Mark Sweep) are running between -3 and -5. If I remove the -XX:ThreadPriorityPolicy=42 option, even though I'm running as root, all threads run at a normal priority.

        Show
        Brandon Williams added a comment - - edited There is some weirdness here. With the JVM options added, I see the compaction thread at niceness -5 as I would expect, however many other threads (like Concurrent Mark Sweep) are running between -3 and -5. If I remove the -XX:ThreadPriorityPolicy=42 option, even though I'm running as root, all threads run at a normal priority.
        Hide
        Jonathan Ellis added a comment -

        it's expected that everything runs at normal priority w/o TPP, even as root.

        normally, you would set TPP=1 to allow thread priorities to be set in the jvm. the reason 42 is a trick is that the jvm will not let you set it to 1 if you are not running as root, but the only reason you have to be root as far as linux is concerned is to raise thread priority. lowering it is always ok. so setting it to 42 fools the "is tpp 0" check on setPriority, without having the jvm's excessively stringent santity-checking hose you.

        source diving at http://stackoverflow.com/questions/1662185/do-linux-jvms-actually-implement-thread-priorities

        Show
        Jonathan Ellis added a comment - it's expected that everything runs at normal priority w/o TPP, even as root. normally, you would set TPP=1 to allow thread priorities to be set in the jvm. the reason 42 is a trick is that the jvm will not let you set it to 1 if you are not running as root, but the only reason you have to be root as far as linux is concerned is to raise thread priority. lowering it is always ok. so setting it to 42 fools the "is tpp 0" check on setPriority, without having the jvm's excessively stringent santity-checking hose you. source diving at http://stackoverflow.com/questions/1662185/do-linux-jvms-actually-implement-thread-priorities
        Hide
        Stu Hood added a comment -

        I'm -0 on changing the compaction thread priority, as opposed to using a solution like ecapriolo has suggested. This solution might even backfire and cause worse read performance if compaction loses enough priority that you end up with tons of sstables waiting to be compacted.

        While 'sleeping' compaction threads manually is akin to building our own scheduler, we have so much more information than the OS scheduler does that it will probably be worthwhile. The ideal situation would be that while writes are incoming, compaction is running almost constantly, so that it is fully amortized. That is, there should only be one possible cascaded compaction at a time, and it should finish immediately before the next compaction becomes possible.

        Show
        Stu Hood added a comment - I'm -0 on changing the compaction thread priority, as opposed to using a solution like ecapriolo has suggested. This solution might even backfire and cause worse read performance if compaction loses enough priority that you end up with tons of sstables waiting to be compacted. While 'sleeping' compaction threads manually is akin to building our own scheduler, we have so much more information than the OS scheduler does that it will probably be worthwhile. The ideal situation would be that while writes are incoming, compaction is running almost constantly, so that it is fully amortized. That is, there should only be one possible cascaded compaction at a time, and it should finish immediately before the next compaction becomes possible.
        Hide
        Jonathan Ellis added a comment -

        To reiterate, this patch leaves the default behavior unchanged, but allows decreasing CM priority as described above.

        if you don't have enough capacity to both compact and handle read/write load, then you're screwed. writing a manual scheduler that may or may not do slightly better in the short run is not going to change that.

        what we want to do is slow compaction down so that instead of short violent bursts of CM activity you spread it out over a longer period of time.

        Show
        Jonathan Ellis added a comment - To reiterate, this patch leaves the default behavior unchanged, but allows decreasing CM priority as described above. if you don't have enough capacity to both compact and handle read/write load, then you're screwed. writing a manual scheduler that may or may not do slightly better in the short run is not going to change that. what we want to do is slow compaction down so that instead of short violent bursts of CM activity you spread it out over a longer period of time.
        Hide
        Brandon Williams added a comment -

        +1

        Show
        Brandon Williams added a comment - +1
        Hide
        Jonathan Ellis added a comment -

        committed

        Show
        Jonathan Ellis added a comment - committed
        Hide
        Edward Capriolo added a comment -

        Just wanted to check back in. I enabled the thread priorities and initiated a compaction.
        Both my latency and tpstats looked on par with other nodes in the cluster that were not compacting at the time. This looks great. I included some system statistics to show off. Thanks!

        Show
        Edward Capriolo added a comment - Just wanted to check back in. I enabled the thread priorities and initiated a compaction. Both my latency and tpstats looked on par with other nodes in the cluster that were not compacting at the time. This looks great. I included some system statistics to show off. Thanks!
        Hide
        Jonathan Ellis added a comment -

        Thanks for trying it out!

        Show
        Jonathan Ellis added a comment - Thanks for trying it out!
        Hide
        Brandon Williams added a comment -

        Reopened for trunk. We need to add these JVM options to cassandra-env.sh and expose the compaction thread priority in the yaml config.

        Show
        Brandon Williams added a comment - Reopened for trunk. We need to add these JVM options to cassandra-env.sh and expose the compaction thread priority in the yaml config.
        Hide
        Jonathan Ellis added a comment -

        let's make priority=1 the default, and add the jvm options to cassandra-env.sh

        should we reject setting compaction to higher priority than NORMAL? no idea what will happen if you do higher-than-normal on a JVM as non-root, since we've explicitly defeated its protection against doing that

        Show
        Jonathan Ellis added a comment - let's make priority=1 the default, and add the jvm options to cassandra-env.sh should we reject setting compaction to higher priority than NORMAL? no idea what will happen if you do higher-than-normal on a JVM as non-root, since we've explicitly defeated its protection against doing that
        Hide
        Brandon Williams added a comment -

        Updated to make MIN_PRIORITY the default, and constrain values to be less than or equal to NORM_PRIORITY. Setting it any higher is silly anyway.

        Show
        Brandon Williams added a comment - Updated to make MIN_PRIORITY the default, and constrain values to be less than or equal to NORM_PRIORITY. Setting it any higher is silly anyway.
        Hide
        Jonathan Ellis added a comment -

        +1

        Show
        Jonathan Ellis added a comment - +1
        Hide
        Brandon Williams added a comment -

        Committed.

        Show
        Brandon Williams added a comment - Committed.

          People

          • Assignee:
            Brandon Williams
            Reporter:
            Jonathan Ellis
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development