Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23678

Literate builder API for version management in schema

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.3.0, 1.7.0
    • None
    • None
    • Hide
      ColumnFamilyDescriptor new builder API:

          /**
           * Retain all versions for a given TTL(retentionInterval), and then only a specific number
           * of versions(versionAfterInterval) after that interval elapses.
           *
           * @param retentionInterval Retain all versions for this interval
           * @param versionAfterInterval Retain no of versions to retain after retentionInterval
           */
          public ModifyableColumnFamilyDescriptor setVersionsWithTimeToLive(
              final int retentionInterval, final int versionAfterInterval)
      Show
      ColumnFamilyDescriptor new builder API:     /**      * Retain all versions for a given TTL(retentionInterval), and then only a specific number      * of versions(versionAfterInterval) after that interval elapses.      *      * @param retentionInterval Retain all versions for this interval      * @param versionAfterInterval Retain no of versions to retain after retentionInterval      */     public ModifyableColumnFamilyDescriptor setVersionsWithTimeToLive(         final int retentionInterval, final int versionAfterInterval)

    Description

      Lars designed the combination of VERSIONS, TTL, MIN_VERSIONS, and KEEP_DELETED_CELLS with a maximum of flexibility. There is a lot of nuance regarding their usage. Almost all combinations of these four settings make sense for some use cases (exceptions are MIN_VERSIONS > 0 without TTL, and KEEP_DELETED_CELLS=TTL without TTL). There should be a way to make the behavior with TTL easier to conceive when creating the schema. This could take the form of a literate builder API for ColumnDescriptor or an extension to an existing one.

      Let me give you a motivating example: We may want to retain all versions for a given TTL, and then only a specific number of versions after that interval elapses. This can be achieved with VERSIONS=INT_MAX, TTL=retention_interval, KEEP_DELETED_CELLS=TTL, MIN_VERSIONS=num_versions . This is not intuitive though because VERSIONS has been used to specify the number of versions to retain (num_versions in this example) since HBase version 0.1, so this is going to be a source of confusion - I've seen it in practice.

      A literate builder API, by way of the way we design its method names, could let a user describe more or less in speaking language how they want version retention to work, and internally the builder API could set the low level schema attributes.

      Attachments

        Issue Links

          Activity

            People

              vjasani Viraj Jasani
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: