Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-11623

Compactions w/ Short Rows Spending Time in getOnDiskFilePointer

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Low
    • Resolution: Fixed
    • 3.8
    • None
    • None

    Description

      Been doing some performance tuning and profiling of my cassandra cluster and noticed that compaction speeds for my tables that I know to have very short rows were going particularly slowly. Profiling shows a ton of time being spent in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a CompactionTask shows that a majority of time is being spent lseek (called by getOnDiskFilePointer), and not read or write.

      Going deeper it looks like we call getOnDiskFilePointer each row (sometimes multiple times per row) in order to see if we've reached our expected sstable size and should start a new writer. This is pretty unnecessary.

      Attachments

        1. compactiontask_profile.png
          339 kB
          Tom Petracca

        Issue Links

          Activity

            People

              tpetracca Tom Petracca
              tpetracca Tom Petracca
              Tom Petracca
              Marcus Eriksson
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: