Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5932

Speculative read performance data show unexpected results

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 2.0.2
    • Component/s: None
    • Labels:
      None
    • Severity:
      Normal

      Description

      I've done a series of stress tests with eager retries enabled that show undesirable behavior. I'm grouping these behaviours into one ticket as they are most likely related.

      1) Killing off a node in a 4 node cluster actually increases performance.
      2) Compactions make nodes slow, even after the compaction is done.
      3) Eager Reads tend to lessen the immediate performance impact of a node going down, but not consistently.

      My Environment:
      1 stress machine: node0
      4 C* nodes: node4, node5, node6, node7

      My script:
      node0 writes some data: stress -d node4 -F 30000000 -n 30000000 -i 5 -l 2 -K 20
      node0 reads some data: stress -d node4 -n 30000000 -o read -i 5 -K 20

      Examples:

      A node going down increases performance:

      Data for this test here

      At 450s, I kill -9 one of the nodes. There is a brief decrease in performance as the snitch adapts, but then it recovers... to even higher performance than before.

      Compactions make nodes permanently slow:


      The green and orange lines represent trials with eager retry enabled, they never recover their op-rate from before the compaction as the red and blue lines do.

      Data for this test here

      Speculative Read tends to lessen the immediate impact:


      This graph looked the most promising to me, the two trials with eager retry, the green and orange line, at 450s showed the smallest dip in performance.

      Data for this test here

      But not always:


      This is a retrial with the same settings as above, yet the 95percentile eager retry (red line) did poorly this time at 450s.

      Data for this test here

        Attachments

        1. eager-read-not-consistent.png
          61 kB
          Ryan McGuire
        2. eager-read-looks-promising.png
          53 kB
          Ryan McGuire
        3. compaction-makes-slow.png
          50 kB
          Ryan McGuire
        4. node-down-increase-performance.png
          32 kB
          Ryan McGuire
        5. eager-read-not-consistent-stats.png
          31 kB
          Ryan McGuire
        6. eager-read-looks-promising-stats.png
          31 kB
          Ryan McGuire
        7. compaction-makes-slow-stats.png
          32 kB
          Ryan McGuire
        8. 5932.txt
          23 kB
          Aleksey Yeschenko
        9. 5933-7a87fc11.png
          83 kB
          Ryan McGuire
        10. 5933-128_and_200rc1.png
          77 kB
          Ryan McGuire
        11. 5933-logs.tar.gz
          565 kB
          Ryan McGuire
        12. 5933-randomized-dsnitch-replica.png
          67 kB
          Ryan McGuire
        13. 5933-randomized-dsnitch-replica.2.png
          79 kB
          Ryan McGuire
        14. 5933-randomized-dsnitch-replica.3.png
          68 kB
          Ryan McGuire
        15. 5932.ded39c7e1c2fa.logs.tar.gz
          536 kB
          Ryan McGuire
        16. 5932-6692c50412ef7d.png
          76 kB
          Ryan McGuire
        17. 5932.6692c50412ef7d.compaction.png
          66 kB
          Ryan McGuire
        18. 5932.6692c50412ef7d.rr0.png
          99 kB
          Ryan McGuire
        19. 5932.6692c50412ef7d.rr1.png
          100 kB
          Ryan McGuire

        Issue Links

          Activity

            People

            • Assignee:
              aleksey Aleksey Yeschenko Assign to me
              Reporter:
              enigmacurry Ryan McGuire
              Authors:
              Aleksey Yeschenko
              Reviewers:
              Jonathan Ellis

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment