Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-15452

Improve disk access patterns during compaction (big format)

    XMLWordPrintableJSON

Details

    • Performance
    • Normal
    • All
    • None
    • Hide

      TBD

      Show
      TBD

    Description

      On read heavy workloads Cassandra performs much better when using a low read ahead setting.   In my tests I've seen an 5x improvement in throughput and more than a 50% reduction in latency.  However, I've also observed that it can have a negative impact on compaction and streaming throughput. It especially negatively impacts cloud environments where small reads incur high costs in IOPS due to tiny requests.

      1. We should investigate using POSIX_FADV_DONTNEED on files we're compacting to see if we can improve performance and reduce page faults. 
      2. This should be combined with an internal read ahead style buffer that Cassandra manages, similar to a BufferedInputStream but with our own machinery.  This buffer should read fairly large blocks of data off disk at at time.  EBS, for example, allows 1 IOP to be up to 256KB.  A considerable amount of time is spent in blocking I/O during compaction and streaming. Reducing the frequency we read from disk should speed up all sequential I/O operations.
      3. We can reduce system calls by buffering writes as well, but I think it will have less of an impact than the reads

      Attachments

        1. everyfs.txt
          1.01 MB
          Jon Haddad
        2. image-2024-11-22-16-17-23-194.png
          105 kB
          Jon Haddad
        3. iostat-5.0-head.output
          613 kB
          Jordan West
        4. iostat-5.0-patched.output
          249 kB
          Jordan West
        5. iostat-ebs-15452.png
          110 kB
          Jordan West
        6. iostat-ebs-head.png
          144 kB
          Jordan West
        7. iostat-instance-15452.png
          98 kB
          Jordan West
        8. iostat-instance-head.png
          100 kB
          Jordan West
        9. results.txt
          16 kB
          Jon Haddad
        10. screenshot-1.png
          329 kB
          Jon Haddad
        11. screenshot-2.png
          168 kB
          Jon Haddad
        12. screenshot-3.png
          1.42 MB
          Jon Haddad
        13. screenshot-4.png
          1.44 MB
          Jon Haddad
        14. sequential.fio
          0.4 kB
          Jon Haddad
        15. throughput.png
          92 kB
          Jon Haddad
        16. throughput-1.png
          92 kB
          Jon Haddad

        Issue Links

          Activity

            People

              jwest Jordan West
              rustyrazorblade Jon Haddad
              Jordan West
              Caleb Rackliffe
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 4h 10m
                  4h 10m