Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19477

Do not go to disk to get HintsStore.getTotalFileSize

    XMLWordPrintableJSON

Details

    Description

      When testing a cluster with more requests than it could handle, I noticed significant CPU time (25%) spent in HintsStore.getTotalFileSize.  Here's what I'm seeing from profiling:

      10% of CPU time spent in HintsDescriptor.fileName which only does this:

       

      return String.format("%s-%s-%s.hints", hostId, timestamp, version);

      At a bare minimum here we should create this string up front with the host and version and eliminate 2 of the 3 substitutions, but I think it's probably faster to use a StringBuilder and avoid the underlying regular expression altogether.

      12% of the time is spent in org.apache.cassandra.io.util.File.length.  It looks like this is called once for each hint file on disk for each host we're hinting to.  In the case of an overloaded cluster, this is significant.  It would be better if we were to track the file size in memory for each hint file and reference that rather than go to the filesystem.

      These fairly small changes should make Cassandra more reliable when under load spikes.

      CPU Flame graph attached.

      I only tested this in 4.1 but it looks like this is present up to trunk.

       

      Attachments

        1. flamegraph_20240711.html
          4.08 MB
          Gil Ganz
        2. image-2024-03-24-18-20-07-734.png
          192 kB
          Jon Haddad
        3. image-2024-03-24-18-17-48-334.png
          122 kB
          Jon Haddad
        4. image-2024-03-24-18-16-50-370.png
          115 kB
          Jon Haddad
        5. image-2024-03-24-18-08-36-918.png
          149 kB
          Jon Haddad
        6. image-2024-03-24-17-57-32-560.png
          193 kB
          Jon Haddad
        7. flame-cassandra0-patched-2024-03-25_00-40-47.html
          130 kB
          Jon Haddad
        8. flame-cassandra0-release-2024-03-25_00-16-44.html
          136 kB
          Jon Haddad
        9. flamegraph.cpu.html
          409 kB
          Jon Haddad

        Issue Links

          Activity

            People

              smiklosovic Stefan Miklosovic
              rustyrazorblade Jon Haddad
              Stefan Miklosovic
              Aleksey Yeschenko
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 40m
                  4h 40m