Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-13096

Snapshots slow down jmx scraping

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • Observability/Metrics
    • None
    • Normal

    Description

      Hello,

      We are scraping the jmx metrics through a prometheus exporter and we noticed that some nodes became really long to answer (more than 20 seconds). After some investigations we do not find any hardware problem or overload issues on there "slow" nodes. It happens on different clusters, some with only few giga bytes of dataset and it does not seams to be related to a specific version neither as it happens on 2.1, 2.2 and 3.0 nodes.

      After some unsuccessful actions, one of our ideas was to clean the snapshots staying on one problematic node:

      nodetool clearsnapshot
      

      And the magic happens... as you can see in the attached diagrams, the second we cleared the snapshots, the CPU activity dropped immediatly and the duration to scrape the jmx metrics goes from +20 secs to instantaneous...

      Can you enlighten us on this issue? Once again, it appears on our three 2.1, 2.2 and 3.0 versions, on different volumetry and it is not systematically linked to the snapshots as we have some nodes with the same snapshots volume which are going pretty well.

      Attachments

        1. Clear Snapshots.png
          148 kB
          Maxime Fouilleul
        2. CPU Load.png
          300 kB
          Maxime Fouilleul
        3. JMX Scrape Duration.png
          149 kB
          Maxime Fouilleul

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            mfo8689 Maxime Fouilleul
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment