HBase
  1. HBase
  2. HBASE-847

new API: HTable.getRow with numVersion specified

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.2.0
    • Fix Version/s: 0.19.0
    • Component/s: Client
    • Labels:
      None

      Description

      I'd like to be able to call HTable.getRow with numVersions, and get multiple versions for each column.

      1. HBASE_847.patch
        25 kB
        Doğacan Güney
      2. HBASE-847_v2.patch
        33 kB
        Doğacan Güney

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Hide
          Jim Kellerman added a comment -

          What needs to be done:

          o.a.h.h.ipc.HRegionInterface:

          • bump versionID
          • change:
            public RowResult getRow(final byte[] regionName, final byte[] row, final byte[][] columns, final long ts, final long lockId)
            
            // to:
            
            public RowResult getRow(final byte[] regionName, final byte[] row, final byte[][] columns, final long ts, final int numVersions, final long lockId)
            

          o.a.h.h.client.HTable:

          • add overloads to getRow:
            public RowResult getRow(String row, int numVersions)
            public RowResult getRow(String row, long timestamp, int numVersions)
            public RowResult getRow(String row, String[] columns, int numVersions)
            public RowResult getRow(String row, String[] columns, long timestamp, int numVersions)
            public RowResult getRow(String row, String[] columns, long timestamp, int numVersions, RowLock rowLock)
            
            public RowResult getRow(byte[] row, int numVersions)
            public RowResult getRow(byte[] row, long timestamp, int numVersions)
            public RowResult getRow(byte[] row, byte[][] columns, int numVersions)
            public RowResult getRow(byte[] row, byte[][] columns, long timestamp, int numVersions)
            
          • replace:
            public RowResult getRow(byte[] row, byte[][] columns, long timestamp, RowLock rowLock)
            
            // with:
            
            public RowResult getRow(byte[] row, byte[][] columns, long timestamp, int numVersions, RowLock rowLock)
            

          All getRow(String...) methods should call:

          public RowResult getRow(String row, String[] columns, long timestamp, int numVersions, RowLock rowLock)
          
          // which calls:
          
          public RowResult getRow(byte[] row, byte[][] columns, long timestamp, int numVersions, RowLock rowLock)
          

          Similarly all getRow(byte[]...) methods should call:

          public RowResult getRow(byte[] row, byte[][] columns, long timestamp, int numVersions, RowLock rowLock)
          

          which will use the new getRow api in HRegionInterface described above.

          Modify HRegionServer.getRow to match the change in HRegionInterface. This will require corresponding changes to HRegion.getFull, HStore.

          {getFull,getFullFromMapFile}

          and Memcache.

          {getFull,internalGetFull}

          Multiple values and timestamps for the same column:family can be stored in a single Cell using either of the constructors:

          Cell(String[] vals, long[] ts)
          Cell(byte[][] vals, long[] ts)
          
          Show
          Jim Kellerman added a comment - What needs to be done: o.a.h.h.ipc.HRegionInterface: bump versionID change: public RowResult getRow( final byte [] regionName, final byte [] row, final byte [][] columns, final long ts, final long lockId) // to: public RowResult getRow( final byte [] regionName, final byte [] row, final byte [][] columns, final long ts, final int numVersions, final long lockId) o.a.h.h.client.HTable: add overloads to getRow: public RowResult getRow( String row, int numVersions) public RowResult getRow( String row, long timestamp, int numVersions) public RowResult getRow( String row, String [] columns, int numVersions) public RowResult getRow( String row, String [] columns, long timestamp, int numVersions) public RowResult getRow( String row, String [] columns, long timestamp, int numVersions, RowLock rowLock) public RowResult getRow( byte [] row, int numVersions) public RowResult getRow( byte [] row, long timestamp, int numVersions) public RowResult getRow( byte [] row, byte [][] columns, int numVersions) public RowResult getRow( byte [] row, byte [][] columns, long timestamp, int numVersions) replace: public RowResult getRow( byte [] row, byte [][] columns, long timestamp, RowLock rowLock) // with: public RowResult getRow( byte [] row, byte [][] columns, long timestamp, int numVersions, RowLock rowLock) All getRow(String...) methods should call: public RowResult getRow( String row, String [] columns, long timestamp, int numVersions, RowLock rowLock) // which calls: public RowResult getRow( byte [] row, byte [][] columns, long timestamp, int numVersions, RowLock rowLock) Similarly all getRow(byte[]...) methods should call: public RowResult getRow( byte [] row, byte [][] columns, long timestamp, int numVersions, RowLock rowLock) which will use the new getRow api in HRegionInterface described above. Modify HRegionServer.getRow to match the change in HRegionInterface. This will require corresponding changes to HRegion.getFull, HStore. {getFull,getFullFromMapFile} and Memcache. {getFull,internalGetFull} Multiple values and timestamps for the same column:family can be stored in a single Cell using either of the constructors: Cell( String [] vals, long [] ts) Cell( byte [][] vals, long [] ts)
          Hide
          Doğacan Güney added a comment - - edited

          Patch for the issue.

          OK, this is my first big(-ish) patch, so I am sure I am missing something

          Anyway, updates hbase as Jim Kellerman suggested. RowResult#getRow-s don't have any documentation yet. I will update them with a later patch.

          I also want to update scanners so that you can ask for multiple versions from them too (not done yet).

          (Also includes patch from HBASE-892.)

          Show
          Doğacan Güney added a comment - - edited Patch for the issue. OK, this is my first big(-ish) patch, so I am sure I am missing something Anyway, updates hbase as Jim Kellerman suggested. RowResult#getRow-s don't have any documentation yet. I will update them with a later patch. I also want to update scanners so that you can ask for multiple versions from them too (not done yet). (Also includes patch from HBASE-892 .)
          Hide
          Jim Kellerman added a comment -

          Patch does not apply. Patches must be in svn diff format to be accepted.

          Please add a test case to demonstrate that getting multiple versions works (should also include multiple versions with timestamp specified)

          Please do not include a patch for HBASE-52 and HBASE-33 in this patch. Even though they are similar, changes to scanners are more difficult. We try to limit the scope of a single patch in general.

          Insure that the sub issues of this Jira, HBASE-857, HBASE-31 and HBASE-44 are addressed.

          Thanks.

          Show
          Jim Kellerman added a comment - Patch does not apply. Patches must be in svn diff format to be accepted. Please add a test case to demonstrate that getting multiple versions works (should also include multiple versions with timestamp specified) Please do not include a patch for HBASE-52 and HBASE-33 in this patch. Even though they are similar, changes to scanners are more difficult. We try to limit the scope of a single patch in general. Insure that the sub issues of this Jira, HBASE-857 , HBASE-31 and HBASE-44 are addressed. Thanks.
          Hide
          Doğacan Güney added a comment -

          Again, thanks for comments. I will update as you suggested with a new patch.

          Btw, a question: Do you think it is a good idea to change Cell so that if it stores multiple <timestamp, value> pairs, those pairs are sorted? I mean, the value with the latest timestamp will be returned first during an iteration?

          Show
          Doğacan Güney added a comment - Again, thanks for comments. I will update as you suggested with a new patch. Btw, a question: Do you think it is a good idea to change Cell so that if it stores multiple <timestamp, value> pairs, those pairs are sorted? I mean, the value with the latest timestamp will be returned first during an iteration?
          Hide
          Jim Kellerman added a comment -

          > Doğacan Güney - 22/Sep/08 01:58 PM
          >
          > Btw, a question: Do you think it is a good idea to change Cell so that if it stores
          > multiple <timestamp, value> pairs, those pairs are sorted? I mean, the value
          > with the latest timestamp will be returned first during an iteration?

          That would be nice, but will require substantial changes to HStore.

          {getFull,getFullFromMapFile}

          and Memcache.getFull

          At first glance, however, changes are required there just to be able to get multiple versions in the first place.

          Show
          Jim Kellerman added a comment - > Doğacan Güney - 22/Sep/08 01:58 PM > > Btw, a question: Do you think it is a good idea to change Cell so that if it stores > multiple <timestamp, value> pairs, those pairs are sorted? I mean, the value > with the latest timestamp will be returned first during an iteration? That would be nice, but will require substantial changes to HStore. {getFull,getFullFromMapFile} and Memcache.getFull At first glance, however, changes are required there just to be able to get multiple versions in the first place.
          Hide
          Doğacan Güney added a comment -

          Do people think this should wait after HBASE-880 since that issue will change all APIs anyway or shall I work on a new patch now?

          Show
          Doğacan Güney added a comment - Do people think this should wait after HBASE-880 since that issue will change all APIs anyway or shall I work on a new patch now?
          Hide
          Doğacan Güney added a comment -

          New version of patch. Same as the last one except

          • Added a new test case (TestGetMultipleVersions)
          • Changed Cell to keep a reverse sorted map of timestamp->value. This way, a cell is guaranteed to return latest timestamp at the top.
          • Also changed iteration. Cell now iterates over Entry<Long, byte[]>'s. Nothing in hbase code uses cell iteration anyway (and it didn't work just a while back. Still, I am open to suggestions.
          • Added javadoc for new overloads

          There is a small bug. If, say, your table is configured to keep last 3 versions and you have just written code that makes 5 updates to a row/column (with timestamps, t1, t2, t3, t4, t5.) Now if you try asking for 5 versions, you will only get t5, t4 and t3. But if you ask for 5 versions starting from t4, you will get t4, t3, t2 (at least until table is compacted). I don't know if this will be too much of a problem. I also should note that HTable#get also behaves like this.

          About subtasks: I think HBASE-857 and HBASE-44 are covered. I am not sure about HBASE-31. Is it useful to get just timestamps and not values?

          Show
          Doğacan Güney added a comment - New version of patch. Same as the last one except Added a new test case (TestGetMultipleVersions) Changed Cell to keep a reverse sorted map of timestamp->value. This way, a cell is guaranteed to return latest timestamp at the top. Also changed iteration. Cell now iterates over Entry<Long, byte[]>'s. Nothing in hbase code uses cell iteration anyway (and it didn't work just a while back . Still, I am open to suggestions. Added javadoc for new overloads There is a small bug. If, say, your table is configured to keep last 3 versions and you have just written code that makes 5 updates to a row/column (with timestamps, t1, t2, t3, t4, t5.) Now if you try asking for 5 versions, you will only get t5, t4 and t3. But if you ask for 5 versions starting from t4, you will get t4, t3, t2 (at least until table is compacted). I don't know if this will be too much of a problem. I also should note that HTable#get also behaves like this. About subtasks: I think HBASE-857 and HBASE-44 are covered. I am not sure about HBASE-31 . Is it useful to get just timestamps and not values?
          Hide
          stack added a comment -

          Patch looks good. Let me study it more and try it locally and try and get it into 0.19.0. Good stuff.

          Show
          stack added a comment - Patch looks good. Let me study it more and try it locally and try and get it into 0.19.0. Good stuff.
          Hide
          Doğacan Güney added a comment -

          Thanks for comments, stack.

          I am OK with this issue (or HBASE-44 etc) being fixed in 0.19 or 0.20.

          Show
          Doğacan Güney added a comment - Thanks for comments, stack. I am OK with this issue (or HBASE-44 etc) being fixed in 0.19 or 0.20.
          Hide
          stack added a comment -

          Committed. Thanks for the patch Doğacan.

          Show
          stack added a comment - Committed. Thanks for the patch Doğacan.

            People

            • Assignee:
              Doğacan Güney
              Reporter:
              Michael Bieniosek
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development