HBase
  1. HBase
  2. HBASE-899

Support for specifying a timestamp and numVersions on a per-column basis

    Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      This is just an idea and it may be better to wait after the planned API changes. But I think it would be useful to support fetching different timestamps and versions for different columns.

      Example:

      If a row has 2 columns, "col1:" and "col2:" I want to be able to ask for (during scan or read time, doesn't matter) 2 versions of "col1:" (maybe even between timestamps t1 and t2) but only 1 version of "col2:". This would be especially handy if during an MR job you have to read 2 versions of a small column, but do not want the overhead of reading 2 versions of every other column too....

      (Also, the mechanism is already there. I mean, making the changes to support a per-column timestamp/numVersions is ridiculously easy

        Issue Links

          Activity

          Hide
          Sameer Vaishampayan added a comment -

          @Jonathan, Any update on this bug ? Given that it was supposed to be solved as part of 1249 is it now "closeable" ?

          Show
          Sameer Vaishampayan added a comment - @Jonathan, Any update on this bug ? Given that it was supposed to be solved as part of 1249 is it now "closeable" ?
          Hide
          Jonathan Gray added a comment -

          Will be solved as part of 1249 related issues.

          Show
          Jonathan Gray added a comment - Will be solved as part of 1249 related issues.
          Hide
          Jim Kellerman added a comment -

          Once we have HBASE-847 and HBASE-52 in place this should not be difficult to add.

          We also need to factor in HBASE-861. Is it a bug or not?

          Show
          Jim Kellerman added a comment - Once we have HBASE-847 and HBASE-52 in place this should not be difficult to add. We also need to factor in HBASE-861 . Is it a bug or not?
          Hide
          Doğacan Güney added a comment -

          > Although in general what this request is asking for is to move some overhead of culling results from client side to server side. In general is that a good idea? Region servers are quite busy.

          I am just worried about having to pass large amounts of data over RPC, only to consistently discard. It seems... a bit wasteful

          And, if hbase intends to support row-wide timestamp range and numVersions, I just don't see how doing it per-column would be any more difficult or slower. A many-column read will already be done in a read-one-column-merge-result-to-rest kind of way. So, while reading one column, region server just checks what user specified for that column. (or maybe I am missing something

          Show
          Doğacan Güney added a comment - > Although in general what this request is asking for is to move some overhead of culling results from client side to server side. In general is that a good idea? Region servers are quite busy. I am just worried about having to pass large amounts of data over RPC, only to consistently discard. It seems... a bit wasteful And, if hbase intends to support row-wide timestamp range and numVersions, I just don't see how doing it per-column would be any more difficult or slower. A many-column read will already be done in a read-one-column-merge-result-to-rest kind of way. So, while reading one column, region server just checks what user specified for that column. (or maybe I am missing something
          Hide
          Andrew Purtell added a comment -

          Can this be handled with filters? For example, by making a FilterSet that ANDs its terms, then by adding to the set a filter that selects col1 by modified ColumnValueFilter that has comparison operators for timestamps, and then by adding a (new) VersionFilter that only allows through a specified number of versions?

          Although in general what this request is asking for is to move some overhead of culling results from client side to server side. In general is that a good idea? Region servers are quite busy.

          Show
          Andrew Purtell added a comment - Can this be handled with filters? For example, by making a FilterSet that ANDs its terms, then by adding to the set a filter that selects col1 by modified ColumnValueFilter that has comparison operators for timestamps, and then by adding a (new) VersionFilter that only allows through a specified number of versions? Although in general what this request is asking for is to move some overhead of culling results from client side to server side. In general is that a good idea? Region servers are quite busy.

            People

            • Assignee:
              Unassigned
              Reporter:
              Doğacan Güney
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development