Uploaded image for project: 'Apache Cassandra'
  1. Apache Cassandra
  2. CASSANDRA-265

Large object support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Normal
    • Resolution: Won't Fix
    • None
    • None
    • None

    Description

      The standard answer since forever has been "cassandra is a bad fit for large objects."

      But I think it doesn't have to be that way. With a few simplifying assumptions we can make this doable.

      First, screw Thrift. There is no way to specify a stream of bytes cross-platform. You can't mix raw sockets into Thrift very easily so screw it. Make it an internal-only API to start with, like the much-vaunted and much-feared BinaryVerbHandler.

      Second, forget about writing multiple lobs at once. You insert one lob at a time, to a specific column.

      With Thrift out of the equation we are not out of the woods. MessagingService also assumes that Messages will be memory resident and not streamed. One approach to fix this would be to have a StreamingMessage class that consists of a message id (that would be paired w/ origination endpoint to make it unique) and a size. The VerbHandler would keep a Map of incomplete StreamingMessages around until the full size was read. Then they could be disposed of.

      So a LargeObjectCommand would be basically just the command id and the payload, the streamed lob. And we would handle it by streaming it directly to a file. When the stream was complete, we would do a write to the standard commitlog/memtable with a pointer to that lob file. That would then be flushed normally to the sstable. (This would require adding another boolean to Column serialization, whether the value is really a lob pointer. We could combine this with the existing bool into a single byte and have room for a couple more flags, without taking extra space.)

      So lobs would never appear directly in the commitlog, and we would never have to rewrite them multiple times during compaction; just the pointers would get merged, but the lob files themselves would not have to be touched. (Except to remove them when a compaction shows that an older version is no longer needed.)

      Then of course we'd need a corresponding ReadLargeObject command. So the basics are straightforward.

      Read Repair and Hinted Handoff would add a few more wrinkles but nothing fundamentally challenging.

      Thoughts?

      Attachments

        Activity

          People

            Unassigned Unassigned
            jbellis Jonathan Ellis
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: