I thought about this a bit tonight. I think this is essentially impossible to implement unless we do the following:
- Add logical timestamps to HFiles (a few bytes per KV if we use vints and relative to an hfile-wide meta entry)
- Add to the scanner API so that each scan result object also returns the current logical timestamp
- Add logical timestamps to HLog entries so that a server that replays the edits maintains the same logical timestamps of each row.
I think these are all needed in order to maintain consistency in the face of failure or through a flush operation.
Rather than do all of the above, I think we should simply document in Scanner.setBatch that using intra-row scanning loses the consistency guarantee. Also we'll want to augment the acid guarantees doc to state this.