Cassandra
  1. Cassandra
  2. CASSANDRA-4316

Compaction Throttle too bursty with large rows

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.5
    • Component/s: Core
    • Labels:

      Description

      In org.apache.cassandra.db.compaction.CompactionIterable the check for compaction throttling occurs once every 1000 rows. In our workload this is much too large as we have many large rows (16 - 100 MB).

      With a 100 MB row, about 100 GB is read (and possibly written) before the compaction throttle sleeps. This causes bursts of essentially unthrottled compaction IO followed by a long sleep which yields inconsistence performance and high error rates during the bursts.

      We applied a workaround to check throttle every row which solved our performance and error issues:

      line 116 in org.apache.cassandra.db.compaction.CompactionIterable:
      if ((row++ % 1000) == 0)
      replaced with
      if ((row++ % 1) == 0)

      I think the better solution is to calculate how often throttle should be checked based on the throttle rate to apply sleeps more consistently. E.g. if 16MB/sec is the limit then check for sleep after every 16MB is read so sleeps are spaced out about every second.

      1. 4316-1.2.txt
        16 kB
        Yuki Morishita
      2. 4316-1.2-v2.txt
        17 kB
        Yuki Morishita
      3. 4316-v3.txt
        22 kB
        Jonathan Ellis
      4. 4316-1.2.txt
        24 kB
        Jonathan Ellis

        Activity

        Hide
        Alex Tang added a comment -

        This bug is marked as fixed in 1.2.1, however I don't see any fixes being applied in the current trunk (as of 2012-12-20) for this bug. Also, the code that the OP put into the issue (CompactionIterable:116) doesn't seem to have been implemented either. Can someone explain how this was fixed?

        Thanks.

        Show
        Alex Tang added a comment - This bug is marked as fixed in 1.2.1, however I don't see any fixes being applied in the current trunk (as of 2012-12-20) for this bug. Also, the code that the OP put into the issue (CompactionIterable:116) doesn't seem to have been implemented either. Can someone explain how this was fixed? Thanks.
        Hide
        Yuki Morishita added a comment -

        This issue is still open. Fix for indicates it is targeted to be released in version 1.2.1.

        Show
        Yuki Morishita added a comment - This issue is still open. Fix for indicates it is targeted to be released in version 1.2.1.
        Hide
        Alex Tang added a comment -

        <facepalm>. Sorry about that. I misread it. Thanks.

        BTW, do you have an idea on what the method of fix is? Is it going to be the OP's suggestion to change how often the compaction happens or to perform a more complex calculation to make sleeps more consistent?

        Show
        Alex Tang added a comment - <facepalm>. Sorry about that. I misread it. Thanks. BTW, do you have an idea on what the method of fix is? Is it going to be the OP's suggestion to change how often the compaction happens or to perform a more complex calculation to make sleeps more consistent?
        Hide
        Yuki Morishita added a comment -

        I'm thinking of doing throttling in other place(possibly LazilyCompactedRow) especially for wide rows.

        Show
        Yuki Morishita added a comment - I'm thinking of doing throttling in other place(possibly LazilyCompactedRow) especially for wide rows.
        Hide
        Yuki Morishita added a comment -

        For more accurate throttling for both skinny and wide rows, I replaced cassandra's Throttle with guava's RateLimiter.
        RateLimiter is held by CompactionManager to be used globally. RateLimiter is set to have bytes per second permits, and when SSTableScanner tries to read the data of one row, it acquires RateLimiter's permission based on row size.

        This way we don't have to wait 1000 rows to be read in order to throttle read in compaction.

        Show
        Yuki Morishita added a comment - For more accurate throttling for both skinny and wide rows, I replaced cassandra's Throttle with guava's RateLimiter. RateLimiter is held by CompactionManager to be used globally. RateLimiter is set to have bytes per second permits, and when SSTableScanner tries to read the data of one row, it acquires RateLimiter's permission based on row size. This way we don't have to wait 1000 rows to be read in order to throttle read in compaction.
        Hide
        Jonathan Ellis added a comment -

        Looks like mayThrottle (both implementations) is missing conversion from bytes -> MB. (Actually looking at the RateLimiter creation, it looks like the name dataSizeInMB is misleading since it is actually still bytes.)

        Does StandaloneScrubber ratelimit? It probably shouldn't.

        Scanner in cleanup compaction needs a withRateLimit.

        Is looping over scanner.getCurrentPosition for each row compacted going to eat CPU? Maybe every N rows would be better, with N = 1MB / average row size. Quite possibly it's not actually a problem and I'm prematurely complexifying.

        Nits:

        • "// throttle if needed" comment is redundant
        • "maybeThrottle" is a better method name than "mayThrottle"
        • May be able to simplify getCompactionRateLimiter by creating a default limiter even if we are unthrottled (since if we are unthrottled we ignore it anyway), so you don't need to worry about == null checks

        Rest LGTM. Should we open a new ticket to move FST to RateLimiter and get rid of Throttle entirely?

        Show
        Jonathan Ellis added a comment - Looks like mayThrottle (both implementations) is missing conversion from bytes -> MB. (Actually looking at the RateLimiter creation, it looks like the name dataSizeInMB is misleading since it is actually still bytes.) Does StandaloneScrubber ratelimit? It probably shouldn't. Scanner in cleanup compaction needs a withRateLimit. Is looping over scanner.getCurrentPosition for each row compacted going to eat CPU? Maybe every N rows would be better, with N = 1MB / average row size. Quite possibly it's not actually a problem and I'm prematurely complexifying. Nits: "// throttle if needed" comment is redundant "maybeThrottle" is a better method name than "mayThrottle" May be able to simplify getCompactionRateLimiter by creating a default limiter even if we are unthrottled (since if we are unthrottled we ignore it anyway), so you don't need to worry about == null checks Rest LGTM. Should we open a new ticket to move FST to RateLimiter and get rid of Throttle entirely?
        Hide
        Yuki Morishita added a comment -

        v2 attached with fixes above.
        This version alse uses RateLimiter of Double.MAX_VALUE permits as default, unlimiting limiter(same as hinted handoff throttling).

        Is looping over scanner.getCurrentPosition for each row compacted going to eat CPU?

        I don't think it is going to be a problem.

        Should we open a new ticket to move FST to RateLimiter and get rid of Throttle entirely?

        Yes. We can do the same to streaming.

        Show
        Yuki Morishita added a comment - v2 attached with fixes above. This version alse uses RateLimiter of Double.MAX_VALUE permits as default, unlimiting limiter(same as hinted handoff throttling). Is looping over scanner.getCurrentPosition for each row compacted going to eat CPU? I don't think it is going to be a problem. Should we open a new ticket to move FST to RateLimiter and get rid of Throttle entirely? Yes. We can do the same to streaming.
        Hide
        Yuki Morishita added a comment -

        Slightly updated around RateLimiter creation.

        Show
        Yuki Morishita added a comment - Slightly updated around RateLimiter creation.
        Hide
        Jonathan Ellis added a comment -

        Hmm... taking a step back for a minute. Since the minimum unit of throttling is still a row, do we win anything with the extra complexity of pushing it down into SSTableScanner? Why not just leave it in CompactionIterable loop, w/ adaptive checking based on row size as above?

        Show
        Jonathan Ellis added a comment - Hmm... taking a step back for a minute. Since the minimum unit of throttling is still a row, do we win anything with the extra complexity of pushing it down into SSTableScanner? Why not just leave it in CompactionIterable loop, w/ adaptive checking based on row size as above?
        Hide
        Yuki Morishita added a comment -

        If we are compaction 32 SSTables at once, we are currently throttling every 32 rows read. If we are using LazilyCompactedRow for wide rows, the row is read twice to write out compacted row (for now). And I don't know if that "N = 1MB / average row size" above fits well for various row sizes. "1MB" sounds like the same as 1000 in current "% 1000" approach.
        So instead pushing down throttling at SSTableScanner level, we can throttle more accurately.
        Maybe we can go further and use RateLimiter in RandomAccessReader for more accuracy.

        Show
        Yuki Morishita added a comment - If we are compaction 32 SSTables at once, we are currently throttling every 32 rows read. If we are using LazilyCompactedRow for wide rows, the row is read twice to write out compacted row (for now). And I don't know if that "N = 1MB / average row size" above fits well for various row sizes. "1MB" sounds like the same as 1000 in current "% 1000" approach. So instead pushing down throttling at SSTableScanner level, we can throttle more accurately. Maybe we can go further and use RateLimiter in RandomAccessReader for more accuracy.
        Hide
        Jonathan Ellis added a comment - - edited

        Not thrilled about pushing this down into RAR. Feels like we're violating encapsulation too much.

        What if we did this?

        1. Make compaction truly single-pass (CASSANDRA-4180)
        2. Introduce a SequentialReader to replace RAR
        3. Subclass SR to CompactionReader and add throttling there

        (I suppose we could subclass RAR already for CompactionReader, but since I want to do #1 and #2 anyway, it would be less work to do it in this order...

        Show
        Jonathan Ellis added a comment - - edited Not thrilled about pushing this down into RAR. Feels like we're violating encapsulation too much. What if we did this? Make compaction truly single-pass ( CASSANDRA-4180 ) Introduce a SequentialReader to replace RAR Subclass SR to CompactionReader and add throttling there (I suppose we could subclass RAR already for CompactionReader, but since I want to do #1 and #2 anyway, it would be less work to do it in this order...
        Hide
        Jonathan Ellis added a comment -

        I was hoping we could get rid of RandomAccessReader, but that's not realistic; we need to be able to re-use readers for multiple operations for performance (CASSANDRA-4942), which implies being able to seek, which basically leaves us with RAR.

        Also, if we throttle at the buffering level, throttling RAR is not really any more complex than a SequentialReader would be.

        Show
        Jonathan Ellis added a comment - I was hoping we could get rid of RandomAccessReader, but that's not realistic; we need to be able to re-use readers for multiple operations for performance ( CASSANDRA-4942 ), which implies being able to seek, which basically leaves us with RAR. Also, if we throttle at the buffering level, throttling RAR is not really any more complex than a SequentialReader would be.
        Hide
        Jonathan Ellis added a comment -

        v3 attached with this approach. Builds on v2's replacing the homegrown throttling with RateLimiter.

        v3 is against trunk (on top of the not-yet-committed branch for 4180, actually) but I think it could be applied to 1.2 without much disruption.

        Show
        Jonathan Ellis added a comment - v3 attached with this approach. Builds on v2's replacing the homegrown throttling with RateLimiter. v3 is against trunk (on top of the not-yet-committed branch for 4180, actually) but I think it could be applied to 1.2 without much disruption.
        Hide
        Jonathan Ellis added a comment - - edited

        we [should] open a new ticket to move FST to RateLimiter and get rid of Throttle entirely

        Done: CASSANDRA-5522.

        Show
        Jonathan Ellis added a comment - - edited we [should] open a new ticket to move FST to RateLimiter and get rid of Throttle entirely Done: CASSANDRA-5522 .
        Hide
        Jonathan Ellis added a comment -

        Patch against 1.2 added.

        Show
        Jonathan Ellis added a comment - Patch against 1.2 added.
        Hide
        Yuki Morishita added a comment -

        +1 on 1.2 patch. You need license header for new classes though.

        Show
        Yuki Morishita added a comment - +1 on 1.2 patch. You need license header for new classes though.
        Hide
        Jonathan Ellis added a comment -

        fixed and committed

        Show
        Jonathan Ellis added a comment - fixed and committed

          People

          • Assignee:
            Jonathan Ellis
            Reporter:
            Wayne Lewis
            Reviewer:
            Yuki Morishita
            Tester:
            Ryan McGuire
          • Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development