Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-5506

Reduce memory consumption of IndexSummary

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: 1.2.5
    • Component/s: None
    • Labels:

      Description

      I am evaluating cassandra for a use case with many tiny rows which would result in a node with 1-3TB of storage having billions of rows. Before loading that much data I am hitting GC issues and when looking at the heap dump I noticed that 70+% of the memory was used by IndexSummaries.

      The two major issues seem to be:

      1) that the positions are stored as an ArrayList<Long> which results in each position taking 24 bytes (class + flags + 8 byte long). This might make sense when the file is initially written but once it has been serialized it would be a lot more memory efficient to just have an long[] (really a int[] would be fine unless 2GB sstables are allowed).

      2) The DecoratedKey for a byte[16] key takes 195 bytes – this is for the overhead of the ByteBuffer in the key and overhead in the token.

      To somewhat "work around" the problem I have increased index_sample but will this many rows that didn't really help starts to have diminishing returns.

      NOTE: This heap dump was from linux with a 64bit oracle vm.

        Issue Links

          Activity

          Hide
          jbellis Jonathan Ellis added a comment -

          https://github.com/jbellis/cassandra/commits/5506 makes IndexSummary use a long[] and byte[][] to save memory.

          (I'm fairly confident the performance hit for re-decorating during index lookups will be negligible, since we only have to do the lookup on cache miss.)

          Show
          jbellis Jonathan Ellis added a comment - https://github.com/jbellis/cassandra/commits/5506 makes IndexSummary use a long[] and byte[][] to save memory. (I'm fairly confident the performance hit for re-decorating during index lookups will be negligible, since we only have to do the lookup on cache miss.)
          Hide
          vijay2win@yahoo.com Vijay added a comment - - edited

          I have been thinking about moving IS off-heap for a while, I am really happy to see this ticket... Just wanted to try and add value

          Instead of storing the long[] and byte[][] in memory, can we store the indexes/pointers of the decorated key in memory... which will be helpful to address the off-heap decorated key's and offset?

          For example:
          During the binary search, we can use offheap indexes.length to find the midpoint in memory then reference it back to offheap BB which will be deserialized as needed (Summary effectively becomes a contiguous off-heap location)?

          Show
          vijay2win@yahoo.com Vijay added a comment - - edited I have been thinking about moving IS off-heap for a while, I am really happy to see this ticket... Just wanted to try and add value Instead of storing the long[] and byte[][] in memory, can we store the indexes/pointers of the decorated key in memory... which will be helpful to address the off-heap decorated key's and offset? For example: During the binary search, we can use offheap indexes.length to find the midpoint in memory then reference it back to offheap BB which will be deserialized as needed (Summary effectively becomes a contiguous off-heap location)?
          Hide
          jbellis Jonathan Ellis added a comment -

          I'm pretty comfortable switching the representation for 1.2.5; let's make a separate ticket to move off heap for 2.0.

          Show
          jbellis Jonathan Ellis added a comment - I'm pretty comfortable switching the representation for 1.2.5; let's make a separate ticket to move off heap for 2.0.
          Hide
          vijay2win@yahoo.com Vijay added a comment -

          +1 for the patch and +1 for a separate ticket.

          Show
          vijay2win@yahoo.com Vijay added a comment - +1 for the patch and +1 for a separate ticket.
          Hide
          jbellis Jonathan Ellis added a comment -

          committed; created CASSANDRA-5521 for off-heap feature.

          Show
          jbellis Jonathan Ellis added a comment - committed; created CASSANDRA-5521 for off-heap feature.

            People

            • Assignee:
              jbellis Jonathan Ellis
              Reporter:
              nick.p Nick Puz
              Reviewer:
              Vijay
              Tester:
              Ryan McGuire
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development