Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-4175

Reduce memory, disk space, and cpu usage with a column name/id map

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Normal
    • Resolution: Duplicate
    • None
    • None

    Description

      We spend a lot of memory on column names, both transiently (during reads) and more permanently (in the row cache). Compression mitigates this on disk but not on the heap.

      The overhead is significant for typical small column values, e.g., ints.

      Even though we intern once we get to the memtable, this affects writes too via very high allocation rates in the young generation, hence more GC activity.

      Now that CQL3 provides us some guarantees that column names must be defined before they are inserted, we could create a map of (say) 32-bit int column id, to names, and use that internally right up until we return a resultset to the client.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            jbellis Jonathan Ellis
            Votes:
            10 Vote for this issue
            Watchers:
            34 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment