Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-12464

Investigate the potential improvement of parallelism on higher level compactions in LCS

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Low Hanging Fruit

    Description

      According to LevelDB's design doc here, "A compaction merges the contents of the picked files to produce a sequence of level-(L+1) files", it will "switch to producing a new level-(L+1) file after the current output file has reached the target file size" (in our case 160MB), it will also "switch to a new output file when the key range of the current output file has grown enough to overlap more than ten level-(L+2) files". This is to ensure "that a later compaction of a level-(L+1) file will not pick up too much data from level-(L+2)."

      Our current code in LeveledCompactionStrategy doesn't implement this last rule, but we might be able to quickly implement it and see how much a compaction throughput improvement it can deliver. Potentially we can create a scenario where a number of large L0 SSTables are present (e.g. 200GB after switching from STCS) and let it to create thousands of L1 SSTables overflow, and see how fast LCS can digest this much data from L1 and properly upper-level them to completion.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            psivaraju Pramod K Sivaraju Assign to me
            weideng Wei Deng
            Pramod K Sivaraju

            Dates

              Created:
              Updated:

              Slack

                Issue deployment