James and I were discussing this on Friday. It would have a lot of advantages and no disadvantages that I am aware of.
I'd have the column names be 1, 2, 3, ..., rather than a, b, c, ..., but that's cosmetic. Or we can have HEX numbers.
The key is (for
PHOENIX-1940) that we can determine the ordinal of a column from its name, rather than having to do a binary search for it.
We could transition old table by have this mapping be the identity mapping. I.e. if we already have a column of name "some_column" we'd map "some_column" to "some_column" in the mapping. We'd lose the optimizations but still can rename columns cheaply.
Not yet sure if we'd need change anything in HBase. HBase is fundamentally sparse, so we can't know ahead of time how many columns will be returned per row, not even how many column we'd expect. Should discuss. A possible solution is do have "dense" columns packed into a single key value. Storage would be much improved so we read performance for cases where we'd want to see most of those columns. Write would suffer for a simple solution (would need to read back the old values, and rewrite with the new value replaced), could store "update" Cells instead that only hold the diff, and that would be combined during the next compaction. It would be important to store data such that does not have be serialized and deserialized from the row (so PB, Avro, probably out, need to check).
But that's something HBase and/or Phoenix desperately need. I think this should sit on top of HBase as HBase cannot know about optimize storage/packing formats for various problems. Maybe a library.