> I renamed KEYS_BITMAP to just BITMAP, fixed some spots that could leak files, and fixed a compaction bug related to 1916 with testcase.
I incorporated your changes into the latest tarball as 0018, and fixed some silliness in 0019 and 0020.
> There are some changes in here that seem to be bug fixes for other issues, specifically the changes to CFMetaData.java
Dropped from this patch, and added on
> I see in SSTableWriter that BMT will fail on secondary indexed CFs now. Why fail though? Can't they just be built on restart?
Yes, probably: but the naive approach is not very elegant, since when we see the first BMT append, we'll already have the secondary indexes open, so we need to null them out. A better approach would need to indicate to the SSTW constructor/factory that we were intending to write without certain component types... I think this can go in another ticket?
> The whole BitmapIndexWriter Scratch space has me slightly concerned.
There is an alternative to the layout I've implemented here, but it is slower for the most common query type (equality on one bucket), and only slightly faster for extremely general index queries (LT/GT involving most/all of the buckets). We can measure the actual overhead on a single sstable if you'd like.
> AVRO, I don't see the value here. [...] The value of using our BRAF is you have all the work to avoid polluting the page cache
I could go either way on this point: on one hand, this is an extremely simple structure. On the other hand, we get large benefits from compression here, and I'm fairly certain we should use Avro for the rest of the sstable.
Also, it's very simple to use our FileDataInput implementations here via Avro's SeekableInput interface, so we don't necessarily need to throw away any effort. See https://github.com/stuhood/cassandra/commit/1a5c9115cb1410519eff15dd3089772b1e550ae7
> I mentioned above that on the fly indexes should be allowed, however this can happen in a subsequent ticket if you prefer.
Yes, I'd prefer that. It will likely be the highest priority of the 4-5 tickets we need to create if/when this issue goes in.
> As Nick mentioned it would be nice to have some stats on the index available in JMX, for a subsequent ticket.
> I think this implementation should probably be the only secondary index format we support (What's the value of keeping KEYS over this?)
Agreed, pending the optimizations mentioned in previous comments.