Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-12245

initial view build can be parallel

    XMLWordPrintableJSON

    Details

      Description

      On a node with lots of data (~3TB) building a materialized view takes several weeks, which is not ideal. It's doing this in a single thread.

      There are several potential ways this can be optimized :

      • do vnodes in parallel, instead of going through the entire range in one thread
      • just iterate through sstables, not worrying about duplicates, and include the timestamp of the original write in the MV mutation. since this doesn't exclude duplicates it does increase the amount of work and could temporarily surface ghost rows (yikes) but I guess that's why they call it eventual consistency. doing it this way can avoid holding references to all tables on disk, allows parallelization, and removes the need to check other sstables for existing data. this is essentially the 'do a full repair' path

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adelapena Andres de la Peña
                Reporter:
                tvdw Tom van der Woerdt
                Authors:
                Andres de la Peña
                Reviewers:
                Paulo Motta (Deprecated)
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: