Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-15202

Deserialize merkle trees off-heap

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Normal
    • Resolution: Fixed
    • Fix Version/s: 4.0
    • Component/s: Consistency/Repair
    • Labels:
      None

      Description

      CASSANDRA-14096 made the first step to address the heavy on-heap footprint of merkle trees on repair coordinators - by reducing the time frame over which they are referenced, and by more intelligently limiting depth of the trees based on available heap size.

      That alone improves GC profile and prevents OOMs, but doesn’t address the issue entirely. The coordinator still must hold all the trees on heap at once until it’s done diffing them with each other, which has a negative effect, and, by reducing depth, we lose precision and thus cause more overstreaming than before.

      One way to improve the situation further is to build on CASSANDRA-14096 and move the trees entirely off-heap. This is a trivial endeavor, given that we are dealing with what should be full binary trees (though in practice aren’t quite, yet). This JIRA makes the first step towards there - by moving just deserialisation off-heap, leaving construction on the replicas on-heap still.

      Additionally, the proposed patch fixes the issue of replica coordinators sending merkle trees to itself over loopback, costing us a ser/deser loop per tree.

      Please note that there is more room for improvement here, and depending on 4.0 timeline those improvements may or may not land in time. To name a few:

      • with some minor modifications to init(), we can make sure that no matter the range, the tree is always perfectly full; this would allow us to get rid of child pointers in inner nodes, as child node addresses will be trivially calculatable given fixed size of nodes
      • the trees can be easily constructed off-heap so long as you run init() to pre-size the tree to find out how large a buffer you need
      • on-wire format doesn’t need to stream inner nodes, only leaves, and, really, only the hashes of the leaves

        Attachments

        1. offheap-mts-gc.png
          99 kB
          Jeff Jirsa

          Activity

            People

            • Assignee:
              aleksey Aleksey Yeschenko
              Reporter:
              jjirsa Jeff Jirsa
              Authors:
              Aleksey Yeschenko, Jeff Jirsa
              Reviewers:
              Benedict Elliott Smith, Marcus Eriksson
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: