Details
-
Improvement
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
HBASE-2248 turned Gets into Scans server-side. It also removed the invariant that deletes in a file only apply to other files and not itself (no longer processes MemStore deletes when the delete happens). This has implications for our minor compaction policy.
We are currently processing deletes during minor compactions in a way that makes it so we do the actual deleting as we compact, but we retain the delete records themselves. This makes it so we retain the invariant of deletes only applying to other files.
Since this is now gone post HBASE-2248, we should revisit our compaction policies.
Attachments
Attachments
Issue Links
- is related to
-
HBASE-2462 Review compaction heuristic and move compaction code out so standalone and independently testable
- Closed
I think we should remove all delete processing from minor compactions in order to make minor compactions as fast as possible. For me, that would suffice for closing this jira.
Major compactions are another consideration. As I described in
HBASE-2450, the fact that delete markers actually get removed during major compactions makes it so a background process impacts user-facing behavior. This is because old delete records can impact new puts (if i put a value with an older timestamp than a row delete, for example). Before the major it would not show up, after the major this put would be valid.One possibility is we change it so minors don't do anything, then majors do what minors do now (doing the actually deleting, but retaining the deletes themselves). Only downside of that is that once you delete a row at a timestamp, you can never re-insert values older than that delete. Today, this is the case until there is a major compaction. The way to fix this is by taking storefile age into account so that deletes in previous storefiles don't apply to newer storefiles. If we did that, we would have to process deletes during regular compactions because you'd need to look at the relative ages of the storefiles to determine if a particular delete applied or not.
For now, I'd be happy just removing delete tracking in minors and worrying about the rest of these issues for 0.21.