Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-3741

OOMs because delete operations are not accounted

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.1.1
    • Component/s: None
    • Labels:
      None
    • Environment:

      FreeBSD

      Description

      Currently we are moving to new data format where new format is written into new CFs and old one is deleted key-by-key.
      I have started getting OOMs and found out that delete operations are not accounted and so, column families are not flushed (changed == 0 with delete only operations) by storage manager.

      This is pull request that fixed this problem for me: https://github.com/apache/cassandra/pull/5

        Activity

        Hide
        jbellis Jonathan Ellis added a comment -

        Thanks, Vitalii!

        Unfortunately we can't use that patch as is because adding ops * 20 in there is going to throw off the size calculation for other workloads.

        Note that the "throughput" size for a deletion is NOT zero (see Column.size implementation). It sounds like you abruptly changed your workload from doing a bunch of larger inserts, then hit it with a ton of deletes all at once and OOMed before it was able to update its liveRatio estimate.

        So the real problem is that if you change workloads dramatically enough, Cassandra's estimates can be off.

        Show
        jbellis Jonathan Ellis added a comment - Thanks, Vitalii! Unfortunately we can't use that patch as is because adding ops * 20 in there is going to throw off the size calculation for other workloads. Note that the "throughput" size for a deletion is NOT zero (see Column.size implementation). It sounds like you abruptly changed your workload from doing a bunch of larger inserts, then hit it with a ton of deletes all at once and OOMed before it was able to update its liveRatio estimate. So the real problem is that if you change workloads dramatically enough, Cassandra's estimates can be off.
        Hide
        hsn Radim Kolar added a comment -

        How can you OOM if you replace large inserts with small deletes?

        Show
        hsn Radim Kolar added a comment - How can you OOM if you replace large inserts with small deletes?
        Hide
        tivv Vitalii Tymchyshyn added a comment -

        Throughput size for deletion IS 0 for me. I am not deleting for some columns, but all columns in a key and operation has 0 columns in this case, this means 0 throughput so I did introduce an overhead for memory operation storage that looks good for me (20 bytes is ~ConcurrentMap size).

        To note: I was getting OOM on startup/log replay.

        The problem is that I have started delete-only workload from some Column Families and this won't change as all inserts go into another column families: we are moving. And for full-key delete-only throughput is 0

        Show
        tivv Vitalii Tymchyshyn added a comment - Throughput size for deletion IS 0 for me. I am not deleting for some columns, but all columns in a key and operation has 0 columns in this case, this means 0 throughput so I did introduce an overhead for memory operation storage that looks good for me (20 bytes is ~ConcurrentMap size). To note: I was getting OOM on startup/log replay. The problem is that I have started delete-only workload from some Column Families and this won't change as all inserts go into another column families: we are moving. And for full-key delete-only throughput is 0
        Hide
        akolyadenko Andriy Kolyadenko added a comment -

        Another attempt to fix this: https://github.com/apache/cassandra/pull/10

        Please consider my patch. It's very annoying to have Cassandra dying in such situations.

        Show
        akolyadenko Andriy Kolyadenko added a comment - Another attempt to fix this: https://github.com/apache/cassandra/pull/10 Please consider my patch. It's very annoying to have Cassandra dying in such situations.
        Hide
        jbellis Jonathan Ellis added a comment -

        Thanks, Andriy. You (and Vitalii) are right; whole-row deletions did indeed have zero throughput because of that behavior.

        Committed with a change to 12 bytes (long + int from deletion info) to match what we do with Column sizes. (We measure the serialized size in Memtable.currentThroughput, then multiply by liveRatio to get a better estimate of the in-memory size. Mixing internal overhead as in CSLM would actually double count that. I've also introduced CASSANDRA-4215 to clean this up more for 1.2.

        Show
        jbellis Jonathan Ellis added a comment - Thanks, Andriy. You (and Vitalii) are right; whole-row deletions did indeed have zero throughput because of that behavior. Committed with a change to 12 bytes (long + int from deletion info) to match what we do with Column sizes. (We measure the serialized size in Memtable.currentThroughput, then multiply by liveRatio to get a better estimate of the in-memory size. Mixing internal overhead as in CSLM would actually double count that. I've also introduced CASSANDRA-4215 to clean this up more for 1.2.
        Hide
        jbellis Jonathan Ellis added a comment -

        (That said, truncate is still the right solution for mass deletes.)

        Show
        jbellis Jonathan Ellis added a comment - (That said, truncate is still the right solution for mass deletes.)
        Hide
        jjordan Jeremiah Jordan added a comment -

        Procedural question, for git stuff should we still be attaching patches to JIRA? Or is a link to the github diff enough?

        Show
        jjordan Jeremiah Jordan added a comment - Procedural question, for git stuff should we still be attaching patches to JIRA? Or is a link to the github diff enough?
        Hide
        jbellis Jonathan Ellis added a comment -

        Intent to contribute in public is enough. github pull requests (and patches sent to the mailing list for that matter) are fine.

        For the purposes of record-keeping though, we like to associate these with Jira tickets.

        Show
        jbellis Jonathan Ellis added a comment - Intent to contribute in public is enough. github pull requests (and patches sent to the mailing list for that matter) are fine. For the purposes of record-keeping though, we like to associate these with Jira tickets.
        Hide
        hsn Radim Kolar added a comment -

        was this backported to cassandra 1.0?

        Show
        hsn Radim Kolar added a comment - was this backported to cassandra 1.0?
        Hide
        jbellis Jonathan Ellis added a comment -

        No, this is a sensitive area of the code and we don't want to risk destabilizing it.

        Show
        jbellis Jonathan Ellis added a comment - No, this is a sensitive area of the code and we don't want to risk destabilizing it.
        Hide
        hsn Radim Kolar added a comment -

        1.0 is already destabilized by this bug. If you want to have minimal effect, then add just 1 byte to live bytes count for every delete. In non delete only workload, it will have minimal effect.

        Its important to have non zero count otherwise it will not be flushed to disk on memory pressure.

        Show
        hsn Radim Kolar added a comment - 1.0 is already destabilized by this bug. If you want to have minimal effect, then add just 1 byte to live bytes count for every delete. In non delete only workload, it will have minimal effect. Its important to have non zero count otherwise it will not be flushed to disk on memory pressure.
        Hide
        hsn Radim Kolar added a comment -

        If you do not want to get fixed, then it should be documented as known bug in NEWS.TXT

        Show
        hsn Radim Kolar added a comment - If you do not want to get fixed, then it should be documented as known bug in NEWS.TXT
        Hide
        akolyadenko Andriy Kolyadenko added a comment -

        Just would like to report that I have the same behavior with 1.2.4.

        Show
        akolyadenko Andriy Kolyadenko added a comment - Just would like to report that I have the same behavior with 1.2.4.
        Hide
        hsn Radim Kolar added a comment -

        i retested it. bug from 1.0 do not exists in 1.1 and 1.2. But its still not optimal and can lead to OOM because it do not adds enough bytes count for tombstone to live data count.

        If i remember some hardcoded constant was used, it needs to be raised.

        Show
        hsn Radim Kolar added a comment - i retested it. bug from 1.0 do not exists in 1.1 and 1.2. But its still not optimal and can lead to OOM because it do not adds enough bytes count for tombstone to live data count. If i remember some hardcoded constant was used, it needs to be raised.

          People

          • Assignee:
            akolyadenko Andriy Kolyadenko
            Reporter:
            tivv Vitalii Tymchyshyn
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development