>> Indexes for individual rows are gone, since the global index allows random access...
> ^ This wouldn't be useful to cache? in the situation you only want a small range of columns?
That information is outdated: it's from the original implementation. But yes... we will want to keep the index in app memory or page cache.
> Roughly how large would the actual chunk be? This is the unit of deserialization right?
The span is the unit of deserialization (made up of at most 1 chunk per level), and its size would be 100% configurable. The main question is how frequently to index the spans in the sstable index: does each span get an index entry? or only the first span of a row (this is our approach in the current implementation).
EDIT: Sorry... the span is symbolic: you would deserialize the first chunk of the span (containing the keys) to decide whether to skip the rest of the chunks in the span.
> So if you are doing a range query on a very wide row how do you know when to stop processing chunks?
By looking at the global index: if all spans get entries in the index, you know the last interesting span.
> Let me know if this is wrong, but this design opens the cassandra data model to contain arbitrarily nested data.
> Given the complexity we already have surrounding the supercolumn concept do you think this is the right way forward?
The super column concept is only confusing because we call them "supercolumns" rather than just calling them "compound column names". People use them, and the consensus I've heard is that they are useful.
> If we assume we keep the datamodel as is how can we simplify the open ended-ness of your design to make the approach fit our current data model.
The only difference is what you call the structures, and whether you put arbitrary limits on the nesting: I'm open to suggestions.