Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-16406

Debug logging affects repair performance

    XMLWordPrintableJSON

    Details

      Description

      While working on the repair quality testing in CASSANDRA-16245, it appeared that the node coordinating repairs on a 20GB per node dataset was generating more than 2G of log with a total duration for the incremental repair scenarios of ~2h40m:
      https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/37/workflows/6a7a41c8-0fca-4080-b37e-3b38998b3fab/jobs/49/steps
      ]
       
      The logs showed a lot of messages from the MerkleTree class at high pace:

      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) Fully inconsistent range [#<TreeRange (-6738651767434294419,-6738622715859497972] depth=11>, #<TreeRange (-6738622715859497972,-6738593664284701525] depth=11>]
      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (9) Inconsistent digest on right sub-range #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10>: [#<OffHeapInner -6738564612709905078 hash=[b8efd3d684474276f316b1bc9f23b463cda4f8d620a4b5cf4d2c7dbb101bbe85] children=[#<OffHeapLeaf [fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]> #<OffHeapLeaf [471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>]>, #<OffHeapInner -6738564612709905078 hash=[95334327a0b50b7d3c51d6588a5f3d57701e0b978f6ab28e85cda3cb5a094eb5] children=[#<OffHeapLeaf [0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]> #<OffHeapLeaf [981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]>]
      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:262 - (10) Hashing sub-ranges [#<TreeRange (-6738593664284701525,-6738564612709905078] depth=11>, #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>] for #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10> divided by midpoint -6738564612709905078
      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:272 - (10) Inconsistent digest on left sub-range #<TreeRange (-6738593664284701525,-6738564612709905078] depth=11>: [#<OffHeapLeaf [fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]>, #<OffHeapLeaf [0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]>]
      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (10) Inconsistent digest on right sub-range #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>: [#<OffHeapLeaf [471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>, #<OffHeapLeaf [981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]
      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) Fully inconsistent range [#<TreeRange (-6738593664284701525,-6738564612709905078] depth=11>, #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>]
      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (9) Fully inconsistent range [#<TreeRange (-6738651767434294419,-6738593664284701525] depth=10>, #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10>]
      DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (8) Inconsistent digest on right sub-range #<TreeRange (-6738535561135108630,-6738419354835922841] depth=9>: [#<OffHeapInner -6738477457985515736 hash=[806ede1a35783bf5fafe8b8ccefe4d3ff48e8f0e1314f8a9ce4b23f13fed4bf9] children=[#<OffHeapInner -6738506509560312183 hash=[e6d133afbd8041266f8a1cfe456ff07c9e7debe8ff54279b579b252f2af78b6b] children=[#]> #<OffHeapInner -6738448406410719289 hash=[66bfedb588f87ad3957497728b91bd436af364e6ec40df3299d006de151ac092] children=[#]>]>, #<OffHeapInner -6738477457985515736 hash=[bf128431ddf7ad72417ce853f90abd0bc7a2a38088d5fbec864c02c516ed6458] children=[#<OffHeapInner -6738506509560312183 hash=[12a71b62ceb096369c4797f56edd91a16dd9803348b4f4cfe1454027d7dae3d2] children=[#]> #<OffHeapInner -6738448406410719289 hash=[adb59f5313473b44dd3b7fa697d72caaaa7b23b3c0610f23670942e2c137878a] children=[#]>]>]

      When disabling debug logging, the duration dropped to ~2h05m with decent log sizes:

      https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/38/workflows/c91e6ceb-438b-4f00-b38a-d670f9afb4c3/jobs/51

      There's apparently too much logging for each inconsistency found in the Merkle tree comparisons and we should move this to TRACE level if we still want to allow debug logging to be turned on by default.

      I'll prepare a patch for the MerkleTree class and run the repair testing scenarios again to verify their duration.

        Attachments

        1. 16406-2-trunk.txt
          8 kB
          Alexander Dejanovski
        2. CASSANDRA-16406.png
          146 kB
          Alexander Dejanovski
        3. without_debug_logging.png
          83 kB
          Alexander Dejanovski
        4. with_debug_logging.png
          89 kB
          Alexander Dejanovski

          Activity

            People

            • Assignee:
              adejanovski Alexander Dejanovski
              Reporter:
              adejanovski Alexander Dejanovski
              Authors:
              Alexander Dejanovski
              Reviewers:
              Marcus Eriksson
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: