Details

    • Type: Task Task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 0.6.3
    • Component/s: Core
    • Labels:
      None

      Description

      I suggested this in a ML thread but it seems that nobody actually tried it. I think it's worth following up on:

      You could try setting the compaction thread to a lower priority. You could add a thread priority to NamedThreadPool, and pass that up from CompactionExecutor constructor. According to http://www.javamex.com/tutorials/threads/priority_what.shtml you have to run as root and add a JVM option to get this to work.

      In particular, Brandon saw stress.py read latencies spike to 100ms during [anti]compaction on a 2 core machine. I'd like to see if this can mitigate that.

      1. CompactionManager.java
        24 kB
        Edward Capriolo
      2. 1181.txt
        3 kB
        Jonathan Ellis
      3. stats.txt
        4 kB
        Edward Capriolo
      4. 1181-trunk.txt
        4 kB
        Brandon Williams

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        14d 15h 52m 1 Jonathan Ellis 25/Jun/10 17:11
        Patch Available Patch Available Resolved Resolved
        5h 30m 1 Jonathan Ellis 25/Jun/10 22:41
        Resolved Resolved Reopened Reopened
        57d 2h 31m 1 Brandon Williams 22/Aug/10 01:13
        Reopened Reopened Resolved Resolved
        1d 22h 47m 1 Brandon Williams 24/Aug/10 00:00
        Gavin made changes -
        Workflow patch-available, re-open possible [ 12752301 ] reopen-resolved, no closed status, patch-avail, testing [ 12758237 ]
        Gavin made changes -
        Workflow no-reopen-closed, patch-avail [ 12513104 ] patch-available, re-open possible [ 12752301 ]
        Brandon Williams made changes -
        Resolution Fixed [ 1 ]
        Status Reopened [ 4 ] Resolved [ 5 ]
        Reviewer jbellis
        Hide
        Brandon Williams added a comment -

        Committed.

        Show
        Brandon Williams added a comment - Committed.
        Hide
        Jonathan Ellis added a comment -

        +1

        Show
        Jonathan Ellis added a comment - +1
        Brandon Williams made changes -
        Attachment 1181-trunk.txt [ 12452875 ]
        Hide
        Brandon Williams added a comment -

        Updated to make MIN_PRIORITY the default, and constrain values to be less than or equal to NORM_PRIORITY. Setting it any higher is silly anyway.

        Show
        Brandon Williams added a comment - Updated to make MIN_PRIORITY the default, and constrain values to be less than or equal to NORM_PRIORITY. Setting it any higher is silly anyway.
        Brandon Williams made changes -
        Attachment 1181-trunk.txt [ 12452865 ]
        Hide
        Jonathan Ellis added a comment -

        let's make priority=1 the default, and add the jvm options to cassandra-env.sh

        should we reject setting compaction to higher priority than NORMAL? no idea what will happen if you do higher-than-normal on a JVM as non-root, since we've explicitly defeated its protection against doing that

        Show
        Jonathan Ellis added a comment - let's make priority=1 the default, and add the jvm options to cassandra-env.sh should we reject setting compaction to higher priority than NORMAL? no idea what will happen if you do higher-than-normal on a JVM as non-root, since we've explicitly defeated its protection against doing that
        Brandon Williams made changes -
        Attachment 1181-trunk.txt [ 12452865 ]
        Brandon Williams made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Assignee Jonathan Ellis [ jbellis ] Brandon Williams [ brandon.williams ]
        Hide
        Brandon Williams added a comment -

        Reopened for trunk. We need to add these JVM options to cassandra-env.sh and expose the compaction thread priority in the yaml config.

        Show
        Brandon Williams added a comment - Reopened for trunk. We need to add these JVM options to cassandra-env.sh and expose the compaction thread priority in the yaml config.
        Hide
        Jonathan Ellis added a comment -

        Thanks for trying it out!

        Show
        Jonathan Ellis added a comment - Thanks for trying it out!
        Edward Capriolo made changes -
        Attachment stats.txt [ 12448908 ]
        Hide
        Edward Capriolo added a comment -

        Just wanted to check back in. I enabled the thread priorities and initiated a compaction.
        Both my latency and tpstats looked on par with other nodes in the cluster that were not compacting at the time. This looks great. I included some system statistics to show off. Thanks!

        Show
        Edward Capriolo added a comment - Just wanted to check back in. I enabled the thread priorities and initiated a compaction. Both my latency and tpstats looked on par with other nodes in the cluster that were not compacting at the time. This looks great. I included some system statistics to show off. Thanks!
        Jonathan Ellis made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        Jonathan Ellis added a comment -

        committed

        Show
        Jonathan Ellis added a comment - committed
        Hide
        Brandon Williams added a comment -

        +1

        Show
        Brandon Williams added a comment - +1
        Hide
        Jonathan Ellis added a comment -

        To reiterate, this patch leaves the default behavior unchanged, but allows decreasing CM priority as described above.

        if you don't have enough capacity to both compact and handle read/write load, then you're screwed. writing a manual scheduler that may or may not do slightly better in the short run is not going to change that.

        what we want to do is slow compaction down so that instead of short violent bursts of CM activity you spread it out over a longer period of time.

        Show
        Jonathan Ellis added a comment - To reiterate, this patch leaves the default behavior unchanged, but allows decreasing CM priority as described above. if you don't have enough capacity to both compact and handle read/write load, then you're screwed. writing a manual scheduler that may or may not do slightly better in the short run is not going to change that. what we want to do is slow compaction down so that instead of short violent bursts of CM activity you spread it out over a longer period of time.
        Hide
        Stu Hood added a comment -

        I'm -0 on changing the compaction thread priority, as opposed to using a solution like ecapriolo has suggested. This solution might even backfire and cause worse read performance if compaction loses enough priority that you end up with tons of sstables waiting to be compacted.

        While 'sleeping' compaction threads manually is akin to building our own scheduler, we have so much more information than the OS scheduler does that it will probably be worthwhile. The ideal situation would be that while writes are incoming, compaction is running almost constantly, so that it is fully amortized. That is, there should only be one possible cascaded compaction at a time, and it should finish immediately before the next compaction becomes possible.

        Show
        Stu Hood added a comment - I'm -0 on changing the compaction thread priority, as opposed to using a solution like ecapriolo has suggested. This solution might even backfire and cause worse read performance if compaction loses enough priority that you end up with tons of sstables waiting to be compacted. While 'sleeping' compaction threads manually is akin to building our own scheduler, we have so much more information than the OS scheduler does that it will probably be worthwhile. The ideal situation would be that while writes are incoming, compaction is running almost constantly, so that it is fully amortized. That is, there should only be one possible cascaded compaction at a time, and it should finish immediately before the next compaction becomes possible.
        Hide
        Jonathan Ellis added a comment -

        it's expected that everything runs at normal priority w/o TPP, even as root.

        normally, you would set TPP=1 to allow thread priorities to be set in the jvm. the reason 42 is a trick is that the jvm will not let you set it to 1 if you are not running as root, but the only reason you have to be root as far as linux is concerned is to raise thread priority. lowering it is always ok. so setting it to 42 fools the "is tpp 0" check on setPriority, without having the jvm's excessively stringent santity-checking hose you.

        source diving at http://stackoverflow.com/questions/1662185/do-linux-jvms-actually-implement-thread-priorities

        Show
        Jonathan Ellis added a comment - it's expected that everything runs at normal priority w/o TPP, even as root. normally, you would set TPP=1 to allow thread priorities to be set in the jvm. the reason 42 is a trick is that the jvm will not let you set it to 1 if you are not running as root, but the only reason you have to be root as far as linux is concerned is to raise thread priority. lowering it is always ok. so setting it to 42 fools the "is tpp 0" check on setPriority, without having the jvm's excessively stringent santity-checking hose you. source diving at http://stackoverflow.com/questions/1662185/do-linux-jvms-actually-implement-thread-priorities
        Hide
        Brandon Williams added a comment - - edited

        There is some weirdness here. With the JVM options added, I see the compaction thread at niceness -5 as I would expect, however many other threads (like Concurrent Mark Sweep) are running between -3 and -5. If I remove the -XX:ThreadPriorityPolicy=42 option, even though I'm running as root, all threads run at a normal priority.

        Show
        Brandon Williams added a comment - - edited There is some weirdness here. With the JVM options added, I see the compaction thread at niceness -5 as I would expect, however many other threads (like Concurrent Mark Sweep) are running between -3 and -5. If I remove the -XX:ThreadPriorityPolicy=42 option, even though I'm running as root, all threads run at a normal priority.
        Jonathan Ellis made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Jonathan Ellis made changes -
        Fix Version/s 0.6.3 [ 12315056 ]
        Fix Version/s 0.7 [ 12314533 ]
        Component/s Core [ 12312978 ]
        Jonathan Ellis made changes -
        Attachment 1181.txt [ 12448066 ]
        Hide
        Jonathan Ellis added a comment - - edited

        to enable, add to cassandra.in.sh:

        -XX:+UseThreadPriorities \
        -XX:ThreadPriorityPolicy=42 \
        -Dcassandra.compaction.priority=1 \

        (TPP=42 is a workaround to allow us to lower thread priority without root privileges, explained at http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html)

        Show
        Jonathan Ellis added a comment - - edited to enable, add to cassandra.in.sh: -XX:+UseThreadPriorities \ -XX:ThreadPriorityPolicy=42 \ -Dcassandra.compaction.priority=1 \ (TPP=42 is a workaround to allow us to lower thread priority without root privileges, explained at http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html )
        Jonathan Ellis made changes -
        Assignee Brandon Williams [ brandon.williams ] Jonathan Ellis [ jbellis ]
        Hide
        Edward Capriolo added a comment -

        I think ThreadPriority will help, however CPU Priority is not directly linked to IO scheduling. I believe between the the statistics cassandra keeps we could accomplish something adaptive. Also if we want a solution that is not 100% java pure we can read /proc/diskstats and use actual disk utilization to be adaptive.

        All these methods I suggested fall apart if non-compaction traffic is thrashing the disk, but if that were the case the node was not healthy in the first place.

        Show
        Edward Capriolo added a comment - I think ThreadPriority will help, however CPU Priority is not directly linked to IO scheduling. I believe between the the statistics cassandra keeps we could accomplish something adaptive. Also if we want a solution that is not 100% java pure we can read /proc/diskstats and use actual disk utilization to be adaptive. All these methods I suggested fall apart if non-compaction traffic is thrashing the disk, but if that were the case the node was not healthy in the first place.
        Hide
        Jonathan Ellis added a comment -

        using thread priority makes that unnecessary. that is why it's a better solution.

        Show
        Jonathan Ellis added a comment - using thread priority makes that unnecessary. that is why it's a better solution.
        Edward Capriolo made changes -
        Field Original Value New Value
        Attachment CompactionManager.java [ 12447848 ]
        Hide
        Edward Capriolo added a comment -

        How about using some information from tpstats to make this adaptive. See sample code?

        Show
        Edward Capriolo added a comment - How about using some information from tpstats to make this adaptive. See sample code?
        Jonathan Ellis created issue -

          People

          • Assignee:
            Brandon Williams
            Reporter:
            Jonathan Ellis
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development