It's not written down anywhere what the guarantees are for each operation in HBase with regard to the various ACID properties. I think the developers know the answers to these questions, but we need a clear spec for people building systems on top of HBase. Here are a few sample questions we should endeavor to answer:
- For a multicell put within a CF, is the update made durable atomically?
- For a put across CFs, is the update made durable atomically?
- Can a read see a row that hasn't been sync()ed to the HLog?
- What isolation do scanners have? Somewhere between snapshot isolation and no isolation?
- After a client receives a "success" for a write operation, is that operation guaranteed to be visible to all other clients?
I see this JIRA as having several points of discussion:
- Evaluation of what the current state of affairs is
- Evaluate whether we currently provide any guarantees that aren't useful to users of the system (perhaps we can drop in exchange for performance)
- Evaluate whether we are missing any guarantees that would be useful to users of the system