Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Lars designed the combination of VERSIONS, TTL, MIN_VERSIONS, and KEEP_DELETED_CELLS with a maximum of flexibility. There is a lot of nuance regarding their usage. Almost all combinations of these four settings make sense for some use cases (exceptions are MIN_VERSIONS > 0 without TTL, and KEEP_DELETED_CELLS=TTL without TTL). There should be a way to make the behavior with TTL easier to conceive when creating the schema. This could take the form of a literate builder API for ColumnDescriptor or an extension to an existing one.
Let me give you a motivating example: We may want to retain all versions for a given TTL, and then only a specific number of versions after that interval elapses. This can be achieved with VERSIONS=INT_MAX, TTL=retention_interval, KEEP_DELETED_CELLS=TTL, MIN_VERSIONS=num_versions . This is not intuitive though because VERSIONS has been used to specify the number of versions to retain (num_versions in this example) since HBase version 0.1, so this is going to be a source of confusion - I've seen it in practice.
A literate builder API, by way of the way we design its method names, could let a user describe more or less in speaking language how they want version retention to work, and internally the builder API could set the low level schema attributes.
Attachments
Issue Links
- links to