Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12245

initial view build can be parallel

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      On a node with lots of data (~3TB) building a materialized view takes several weeks, which is not ideal. It's doing this in a single thread.

      There are several potential ways this can be optimized :

      • do vnodes in parallel, instead of going through the entire range in one thread
      • just iterate through sstables, not worrying about duplicates, and include the timestamp of the original write in the MV mutation. since this doesn't exclude duplicates it does increase the amount of work and could temporarily surface ghost rows (yikes) but I guess that's why they call it eventual consistency. doing it this way can avoid holding references to all tables on disk, allows parallelization, and removes the need to check other sstables for existing data. this is essentially the 'do a full repair' path

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            adelapena Andres de la Peña Assign to me
            tvdw Tom van der Woerdt
            Andres de la Peña
            Paulo Motta (Deprecated)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment