Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-13299

Potential OOMs and lock contention in write path streams

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      I see a potential OOM, when a stream (e.g. repair) goes through the write path as it is with MVs.

      StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators and they again produce mutations. So every partition creates a single mutation, which in case of (very) big partitions can result in (very) big mutations. Those are created on heap and stay there until they finished processing.

      I don't think it is necessary to create a single mutation for each partition. Why don't we implement a PartitionUpdateGeneratorIterator that takes a UnfilteredRowIterator and a max size and spits out PartitionUpdates to be used to create and apply mutations?
      The max size should be something like min(reasonable_absolute_max_size, max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size could be like 16M or sth.
      A mutation shouldn't be too large as it also affects MV partition locking. The longer a MV partition is locked during a stream, the higher chances are that WTE's occur during streams.
      I could also imagine that a max number of updates per mutation regardless of size in bytes could make sense to avoid lock contention.

      Love to get feedback and suggestions, incl. naming suggestions.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jasonstack Zhao Yang Assign to me
            brstgt Benjamin Roth
            Zhao Yang
            Paulo Motta
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment