Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11206

Support large partitions on the 3.0 sstable format

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Docs

    Description

      Cassandra saves a sample of IndexInfo objects that store the offset within each partition of every 64KB (by default) range of rows. To find a row, we binary search this sample, then scan the partition of the appropriate range.

      The problem is that this scales poorly as partitions grow: on a cache miss, we deserialize the entire set of IndexInfo, which both creates a lot of GC overhead (as noted in CASSANDRA-9754) but is also non-negligible i/o activity (relative to reading a single 64KB row range) as partitions get truly large.

      We introduced an "offset map" in CASSANDRA-10314 that allows us to perform the IndexInfo bsearch while only deserializing IndexInfo that we need to compare against, i.e. log(N) deserializations.

      Attachments

        1. 11206-gc.png
          426 kB
          Robert Stupp
        2. trunk-gc.png
          482 kB
          Robert Stupp

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            snazy Robert Stupp Assign to me
            jbellis Jonathan Ellis
            Robert Stupp
            T Jake Luciani
            Votes:
            1 Vote for this issue
            Watchers:
            30 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment