Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12268

Make MV Index creation robust for wide referent rows

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When creating an index for a materialized view for extant data, heap pressure is very dependent on the cardinality of of rows associated with each index value. With the way that per-index value rows are created within the index, this can cause unbounded heap pressure, which can cause OOM. This appears to be a side-effect of how each index row is applied atomically as with batches.

      The commit logs can accumulate enough during the process to prevent the node from being restarted. Given that this occurs during global index creation, this can happen on multiple nodes, making stable recovery of a node set difficult, as co-replicas become unavailable to assist in back-filling data from commitlogs.

      While it is understandable that you want to avoid having relatively wide rows even in materialized views, this represents a particularly difficult scenario for triage.

      The basic recommendation for improving this is to sub-group the index creation into smaller chunks internally, providing a maximal bound against the heap pressure when it is needed.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            carlyeks Carl Yeksigian Assign to me
            jshook Jonathan Shook
            Carl Yeksigian
            T Jake Luciani
            Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment