HBase
  1. HBase
  2. HBASE-2265

HFile and Memstore should maintain minimum and maximum timestamps

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.90.0
    • Component/s: regionserver
    • Labels:
      None

      Description

      In order to fix HBASE-1485 and HBASE-29, it would be very helpful to have HFile and Memstore track their maximum and minimum timestamps. This has the following nice properties:

      • for a straight Get, if an entry has been already been found with timestamp X, and X >= HFile.maxTimestamp, the HFile doesn't need to be checked. Thus, the current fast behavior of get can be maintained for those who use strictly increasing timestamps, but "correct" behavior for those who sometimes write out-of-order.
      • for a scan, the "latest timestamp" of the storage can be used to decide which cell wins, even if the timestamp of the cells is equal. In essence, rather than comparing timestamps, instead you are able to compare tuples of (row timestamp, storage.max_timestamp)
      • in general, min_timestamp(storage A) >= max_timestamp(storage B) if storage A was flushed after storage B.

        Issue Links

          Activity

          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-15 00:33:16.570757)

          Review request for hbase, Nicolas, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Changes
          -------

          Just for completeness, making the change which has been committed to public branch.

          Summary
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          Internal review done by Jonathan and Kannan.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs (updated)


          trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-15 00:33:16.570757) Review request for hbase, Nicolas, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and Kannan Muthukkaruppan. Changes ------- Just for completeness, making the change which has been committed to public branch. Summary ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. Internal review done by Jonathan and Kannan. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs (updated) trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          stack added a comment -

          Resolving because Ryan applied Pranav late edition

          Show
          stack added a comment - Resolving because Ryan applied Pranav late edition
          Hide
          ryan rawson added a comment -

          done

          On Tue, Jul 13, 2010 at 3:06 PM, Pranav Khaitan

          Show
          ryan rawson added a comment - done On Tue, Jul 13, 2010 at 3:06 PM, Pranav Khaitan
          Hide
          ryan rawson added a comment -

          commited the fixup, thanks!

          Show
          ryan rawson added a comment - commited the fixup, thanks!
          Hide
          stack added a comment -

          Reopening till Pranav's fix is applied.

          Show
          stack added a comment - Reopening till Pranav's fix is applied.
          Hide
          Pranav Khaitan added a comment -

          Hi Ryan, Jonathan,

          There is a major correction to this JIRA. I just did some more testing and
          realized that we forgot one thing in the last set of refactoring and
          reformatting.

          In line 422, we had changed the variable from b to timerangeBytes but did
          not change the if statement in next sentence (which is a big deal).

          byte[] timerangeBytes = metadataMap.get(TIMERANGE_KEY);
          if (b!=null)

          Should be changed to:

          byte[] timerangeBytes = metadataMap.get(TIMERANGE_KEY);
          if (timerangeBytes != null)

          I am also attaching the patch with this mail. Please update this asap and
          let me know if you have any questions.

          Regards,
          Pranav

          Show
          Pranav Khaitan added a comment - Hi Ryan, Jonathan, There is a major correction to this JIRA. I just did some more testing and realized that we forgot one thing in the last set of refactoring and reformatting. In line 422, we had changed the variable from b to timerangeBytes but did not change the if statement in next sentence (which is a big deal). byte[] timerangeBytes = metadataMap.get(TIMERANGE_KEY); if (b!=null) Should be changed to: byte[] timerangeBytes = metadataMap.get(TIMERANGE_KEY); if (timerangeBytes != null) I am also attaching the patch with this mail. Please update this asap and let me know if you have any questions. Regards, Pranav
          Hide
          ryan rawson added a comment -

          i committed this with some white space formatting cleanups (along with removing unused imports)

          Show
          ryan rawson added a comment - i committed this with some white space formatting cleanups (along with removing unused imports)
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review395
          -----------------------------------------------------------

          Ship it!

          lgtm

          • Ryan
          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review395 ----------------------------------------------------------- Ship it! lgtm Ryan
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-12 14:48:32.979270)

          Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Changes
          -------

          Formatting..

          Summary
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          Internal review done by Jonathan and Kannan.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs (updated)


          trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-12 14:48:32.979270) Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan. Changes ------- Formatting.. Summary ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. Internal review done by Jonathan and Kannan. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs (updated) trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-12 14:48:59.626469)

          Review request for hbase, Nicolas, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Summary
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          Internal review done by Jonathan and Kannan.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs


          trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-12 14:48:59.626469) Review request for hbase, Nicolas, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and Kannan Muthukkaruppan. Summary ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. Internal review done by Jonathan and Kannan. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 425

          > <http://review.hbase.org/r/257/diff/4/?file=2332#file2332line425>

          >

          > is this idiomatic? I dont think I've seen this particular pattern before?

          Pranav Khaitan wrote:

          This is a utility method in Writables and it does exactly the task we want to get done here. There is an alternate method getWritable which also returns an object of type Writable. Do you think it would be better to use that?

          ok i see, then this should be ok

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 933

          > <http://review.hbase.org/r/257/diff/4/?file=2328#file2328line933>

          >

          > This doesnt appear to be germane to the issue at hand. It shouldn't appear in this patch.

          Pranav Khaitan wrote:

          This method is required for this issue to check if a delete is a Column of Family delete. It is used in TimeRangeTracker. If you suggest not adding this method, I can write this code at the place I am using it at?

          ok my bad. this is fine then.

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 136

          > <http://review.hbase.org/r/257/diff/4/?file=2333#file2333line136>

          >

          > This doesnt seem germane to the issue at hand... Can we not do this here?

          Pranav Khaitan wrote:

          I did this to pass Timerange information to ShouldSeek(). An alternative possibility is to pass TimeRange as an additional parameter. We had a discussion about this in our team here and we thought that since Scan contains all information, it is a neater way of passing than adding more arguments. That way, in future if we add any more filtering techniques at this level, we wont have to change the interface.

          Do you suggest making it two different functions: shouldSeek(row, columns) and shouldSeek(timeRange) ?

          Alternately, it can be combined into one function like shouldSeek(isGet, row, columns, timeRange)

          ok, lets keep it as then.

          • Ryan

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review357
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> On 2010-07-12 13:09:10, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 425 > < http://review.hbase.org/r/257/diff/4/?file=2332#file2332line425 > > > is this idiomatic? I dont think I've seen this particular pattern before? Pranav Khaitan wrote: This is a utility method in Writables and it does exactly the task we want to get done here. There is an alternate method getWritable which also returns an object of type Writable. Do you think it would be better to use that? ok i see, then this should be ok On 2010-07-12 13:09:10, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 933 > < http://review.hbase.org/r/257/diff/4/?file=2328#file2328line933 > > > This doesnt appear to be germane to the issue at hand. It shouldn't appear in this patch. Pranav Khaitan wrote: This method is required for this issue to check if a delete is a Column of Family delete. It is used in TimeRangeTracker. If you suggest not adding this method, I can write this code at the place I am using it at? ok my bad. this is fine then. On 2010-07-12 13:09:10, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 136 > < http://review.hbase.org/r/257/diff/4/?file=2333#file2333line136 > > > This doesnt seem germane to the issue at hand... Can we not do this here? Pranav Khaitan wrote: I did this to pass Timerange information to ShouldSeek(). An alternative possibility is to pass TimeRange as an additional parameter. We had a discussion about this in our team here and we thought that since Scan contains all information, it is a neater way of passing than adding more arguments. That way, in future if we add any more filtering techniques at this level, we wont have to change the interface. Do you suggest making it two different functions: shouldSeek(row, columns) and shouldSeek(timeRange) ? Alternately, it can be combined into one function like shouldSeek(isGet, row, columns, timeRange) ok, lets keep it as then. Ryan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review357 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > there are a number of intenting issues, HBase uses a 2 space base and a 4 space hanging indent standard. Also we space separate the outside of our braces, so it is "if (" not "if(". The interior of the braces are not to be spaced, eg: "(foo)" not "( foo )". I am also seeing a large number of trailing whitespace, you should be able to set your editor to automatically remove that for you. They show up as red blocks on the code review.

          >

          > It also appears that this patch includes pieces from other bloom related patches... those code snippets should be removed from this patch.

          Thanks Ryan, for pointing out the formatting suggestions. I will fix those and upload the code. Meanwhile, I wanted to get your views on the following points.

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, line 733

          > <http://review.hbase.org/r/257/diff/4/?file=2330#file2330line733>

          >

          > indentation, looks like there might be a tab here

          >

          Fixed

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 933

          > <http://review.hbase.org/r/257/diff/4/?file=2328#file2328line933>

          >

          > This doesnt appear to be germane to the issue at hand. It shouldn't appear in this patch.

          This method is required for this issue to check if a delete is a Column of Family delete. It is used in TimeRangeTracker. If you suggest not adding this method, I can write this code at the place I am using it at?

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 425

          > <http://review.hbase.org/r/257/diff/4/?file=2332#file2332line425>

          >

          > is this idiomatic? I dont think I've seen this particular pattern before?

          This is a utility method in Writables and it does exactly the task we want to get done here. There is an alternate method getWritable which also returns an object of type Writable. Do you think it would be better to use that?

          On 2010-07-12 13:09:10, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 136

          > <http://review.hbase.org/r/257/diff/4/?file=2333#file2333line136>

          >

          > This doesnt seem germane to the issue at hand... Can we not do this here?

          I did this to pass Timerange information to ShouldSeek(). An alternative possibility is to pass TimeRange as an additional parameter. We had a discussion about this in our team here and we thought that since Scan contains all information, it is a neater way of passing than adding more arguments. That way, in future if we add any more filtering techniques at this level, we wont have to change the interface.

          Do you suggest making it two different functions: shouldSeek(row, columns) and shouldSeek(timeRange) ?

          Alternately, it can be combined into one function like shouldSeek(isGet, row, columns, timeRange)

          • Pranav

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review357
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> On 2010-07-12 13:09:10, Ryan Rawson wrote: > there are a number of intenting issues, HBase uses a 2 space base and a 4 space hanging indent standard. Also we space separate the outside of our braces, so it is "if (" not "if(". The interior of the braces are not to be spaced, eg: "(foo)" not "( foo )". I am also seeing a large number of trailing whitespace, you should be able to set your editor to automatically remove that for you. They show up as red blocks on the code review. > > It also appears that this patch includes pieces from other bloom related patches... those code snippets should be removed from this patch. Thanks Ryan, for pointing out the formatting suggestions. I will fix those and upload the code. Meanwhile, I wanted to get your views on the following points. On 2010-07-12 13:09:10, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java, line 733 > < http://review.hbase.org/r/257/diff/4/?file=2330#file2330line733 > > > indentation, looks like there might be a tab here > Fixed On 2010-07-12 13:09:10, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 933 > < http://review.hbase.org/r/257/diff/4/?file=2328#file2328line933 > > > This doesnt appear to be germane to the issue at hand. It shouldn't appear in this patch. This method is required for this issue to check if a delete is a Column of Family delete. It is used in TimeRangeTracker. If you suggest not adding this method, I can write this code at the place I am using it at? On 2010-07-12 13:09:10, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 425 > < http://review.hbase.org/r/257/diff/4/?file=2332#file2332line425 > > > is this idiomatic? I dont think I've seen this particular pattern before? This is a utility method in Writables and it does exactly the task we want to get done here. There is an alternate method getWritable which also returns an object of type Writable. Do you think it would be better to use that? On 2010-07-12 13:09:10, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java, line 136 > < http://review.hbase.org/r/257/diff/4/?file=2333#file2333line136 > > > This doesnt seem germane to the issue at hand... Can we not do this here? I did this to pass Timerange information to ShouldSeek(). An alternative possibility is to pass TimeRange as an additional parameter. We had a discussion about this in our team here and we thought that since Scan contains all information, it is a neater way of passing than adding more arguments. That way, in future if we add any more filtering techniques at this level, we wont have to change the interface. Do you suggest making it two different functions: shouldSeek(row, columns) and shouldSeek(timeRange) ? Alternately, it can be combined into one function like shouldSeek(isGet, row, columns, timeRange) Pranav ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review357 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review357
          -----------------------------------------------------------

          there are a number of intenting issues, HBase uses a 2 space base and a 4 space hanging indent standard. Also we space separate the outside of our braces, so it is "if (" not "if(". The interior of the braces are not to be spaced, eg: "(foo)" not "( foo )". I am also seeing a large number of trailing whitespace, you should be able to set your editor to automatically remove that for you. They show up as red blocks on the code review.

          It also appears that this patch includes pieces from other bloom related patches... those code snippets should be removed from this patch.

          trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
          <http://review.hbase.org/r/257/#comment1483>

          This doesnt appear to be germane to the issue at hand. It shouldn't appear in this patch.

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
          <http://review.hbase.org/r/257/#comment1489>

          indentation, looks like there might be a tab here

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <http://review.hbase.org/r/257/#comment1492>

          is this idiomatic? I dont think I've seen this particular pattern before?

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
          <http://review.hbase.org/r/257/#comment1496>

          This doesnt seem germane to the issue at hand... Can we not do this here?

          • Ryan
          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review357 ----------------------------------------------------------- there are a number of intenting issues, HBase uses a 2 space base and a 4 space hanging indent standard. Also we space separate the outside of our braces, so it is "if (" not "if(". The interior of the braces are not to be spaced, eg: "(foo)" not "( foo )". I am also seeing a large number of trailing whitespace, you should be able to set your editor to automatically remove that for you. They show up as red blocks on the code review. It also appears that this patch includes pieces from other bloom related patches... those code snippets should be removed from this patch. trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java < http://review.hbase.org/r/257/#comment1483 > This doesnt appear to be germane to the issue at hand. It shouldn't appear in this patch. trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java < http://review.hbase.org/r/257/#comment1489 > indentation, looks like there might be a tab here trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < http://review.hbase.org/r/257/#comment1492 > is this idiomatic? I dont think I've seen this particular pattern before? trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java < http://review.hbase.org/r/257/#comment1496 > This doesnt seem germane to the issue at hand... Can we not do this here? Ryan
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-09 15:28:53.720013)

          Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Summary
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          Internal review done by Jonathan and Kannan.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs (updated)


          trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-09 15:28:53.720013) Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan. Summary ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. Internal review done by Jonathan and Kannan. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs (updated) trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/client/TestMultipleTimestamps.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          On 2010-07-07 13:58:43, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java, line 55

          > <http://review.hbase.org/r/257/diff/2/?file=2159#file2159line55>

          >

          > I think this information should be maintained in MemStore not inside this data structure. We might get rid of this data structure type and change to another one day. This makes it too hard to do that.

          Pranav Khaitan wrote:

          When we are flushing the memstore to a storefile, we are passing an object of KeyValueSkipListSet. This variable goes through several functions before reaching Store. If we don't have TimeRangeTracker inside KeyValueSkipListSet, we will have to change all flush related functions to take an extra argument as input. This way, in future, if we decide to send another piece of information, we will have to add more arguments. Having TimeRangeTracker inside KeyValueSkipListSet lets us pass the information without changing all flush related functions. Would it still be better to pass TimeRangeTracker as an additional argument?

          this totally makes sense, the only issue is that historically we have KeyValueSkipListSet because we couldnt use SkipListSet with the particular implementation of incrementColumnValue we had. Now that the implementation of ICV is changing (in an unrelated JIRA), we no longer need a specialized SkipListSet and we could use the standard one instead.

          We have the StoreFlusherImpl inside Store which exists to capture this kind of metadata and carry it along, so doing the other thing might not be too painful or bogus, what do you think?

          • Ryan

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review314
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> On 2010-07-07 13:58:43, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java, line 55 > < http://review.hbase.org/r/257/diff/2/?file=2159#file2159line55 > > > I think this information should be maintained in MemStore not inside this data structure. We might get rid of this data structure type and change to another one day. This makes it too hard to do that. Pranav Khaitan wrote: When we are flushing the memstore to a storefile, we are passing an object of KeyValueSkipListSet. This variable goes through several functions before reaching Store. If we don't have TimeRangeTracker inside KeyValueSkipListSet, we will have to change all flush related functions to take an extra argument as input. This way, in future, if we decide to send another piece of information, we will have to add more arguments. Having TimeRangeTracker inside KeyValueSkipListSet lets us pass the information without changing all flush related functions. Would it still be better to pass TimeRangeTracker as an additional argument? this totally makes sense, the only issue is that historically we have KeyValueSkipListSet because we couldnt use SkipListSet with the particular implementation of incrementColumnValue we had. Now that the implementation of ICV is changing (in an unrelated JIRA), we no longer need a specialized SkipListSet and we could use the standard one instead. We have the StoreFlusherImpl inside Store which exists to capture this kind of metadata and carry it along, so doing the other thing might not be too painful or bogus, what do you think? Ryan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review314 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-07 14:22:41.249992)

          Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Changes
          -------

          Incorporated Ryan's feedback regarding formatting.

          Summary
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          Internal review done by Jonathan and Kannan.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs (updated)


          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-07 14:22:41.249992) Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan. Changes ------- Incorporated Ryan's feedback regarding formatting. Summary ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. Internal review done by Jonathan and Kannan. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs (updated) trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          On 2010-07-07 13:58:43, Ryan Rawson wrote:

          > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java, line 55

          > <http://review.hbase.org/r/257/diff/2/?file=2159#file2159line55>

          >

          > I think this information should be maintained in MemStore not inside this data structure. We might get rid of this data structure type and change to another one day. This makes it too hard to do that.

          When we are flushing the memstore to a storefile, we are passing an object of KeyValueSkipListSet. This variable goes through several functions before reaching Store. If we don't have TimeRangeTracker inside KeyValueSkipListSet, we will have to change all flush related functions to take an extra argument as input. This way, in future, if we decide to send another piece of information, we will have to add more arguments. Having TimeRangeTracker inside KeyValueSkipListSet lets us pass the information without changing all flush related functions. Would it still be better to pass TimeRangeTracker as an additional argument?

          • Pranav

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review314
          -----------------------------------------------------------

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> On 2010-07-07 13:58:43, Ryan Rawson wrote: > trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java, line 55 > < http://review.hbase.org/r/257/diff/2/?file=2159#file2159line55 > > > I think this information should be maintained in MemStore not inside this data structure. We might get rid of this data structure type and change to another one day. This makes it too hard to do that. When we are flushing the memstore to a storefile, we are passing an object of KeyValueSkipListSet. This variable goes through several functions before reaching Store. If we don't have TimeRangeTracker inside KeyValueSkipListSet, we will have to change all flush related functions to take an extra argument as input. This way, in future, if we decide to send another piece of information, we will have to add more arguments. Having TimeRangeTracker inside KeyValueSkipListSet lets us pass the information without changing all flush related functions. Would it still be better to pass TimeRangeTracker as an additional argument? Pranav ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review314 -----------------------------------------------------------
          Hide
          HBase Review Board added a comment -

          Message from: "Ryan Rawson" <ryanobjc@gmail.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review314
          -----------------------------------------------------------

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
          <http://review.hbase.org/r/257/#comment1377>

          we use 1 true brace style, please move this and all others up 1 line

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
          <http://review.hbase.org/r/257/#comment1378>

          spacing is like so:
          "if (this.heap != null) {"

          thanks!

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java
          <http://review.hbase.org/r/257/#comment1379>

          I think this information should be maintained in MemStore not inside this data structure. We might get rid of this data structure type and change to another one day. This makes it too hard to do that.

          • Ryan
          Show
          HBase Review Board added a comment - Message from: "Ryan Rawson" <ryanobjc@gmail.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review314 ----------------------------------------------------------- trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java < http://review.hbase.org/r/257/#comment1377 > we use 1 true brace style, please move this and all others up 1 line trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java < http://review.hbase.org/r/257/#comment1378 > spacing is like so: "if (this.heap != null) {" thanks! trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java < http://review.hbase.org/r/257/#comment1379 > I think this information should be maintained in MemStore not inside this data structure. We might get rid of this data structure type and change to another one day. This makes it too hard to do that. Ryan
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-07 13:53:44.275551)

          Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Changes
          -------

          Kannan's suggestion sounds good that we should do the inexpensive comparison before. Modified the code to include that. Now, we are simply passing the scan object to each filter and leaving it upto the filter to extract whatever information it needs.

          Summary
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          Internal review done by Jonathan and Kannan.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs (updated)


          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-07 13:53:44.275551) Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan. Changes ------- Kannan's suggestion sounds good that we should do the inexpensive comparison before. Modified the code to include that. Now, we are simply passing the scan object to each filter and leaving it upto the filter to extract whatever information it needs. Summary ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. Internal review done by Jonathan and Kannan. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs (updated) trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          HBase Review Board added a comment -

          Message from: "Kannan Muthukkaruppan" <kannan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review312
          -----------------------------------------------------------

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
          <http://review.hbase.org/r/257/#comment1375>

          Could we hoist the cheaper check first?

          Since isGetScan() has to do a byte comparison of start/endRow it would be better to do this only if bloom filters are actually in use.

          So change the second part of the expression to something like:

          ( (this.bloomFilter == null)

          (!scan.isGetScan())
          passesBloomFilter(...))

          Or, you could just pass the Scan to passesBloomFilter() instead of scan.getStartRow().

          And there we already check for this.bloomFilter == null first.

          Then you could add the check for "scan.isGetScan".

          And then the rest of the function.

          • Kannan
          Show
          HBase Review Board added a comment - Message from: "Kannan Muthukkaruppan" <kannan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review312 ----------------------------------------------------------- trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < http://review.hbase.org/r/257/#comment1375 > Could we hoist the cheaper check first? Since isGetScan() has to do a byte comparison of start/endRow it would be better to do this only if bloom filters are actually in use. So change the second part of the expression to something like: ( (this.bloomFilter == null) (!scan.isGetScan()) passesBloomFilter(...)) Or, you could just pass the Scan to passesBloomFilter() instead of scan.getStartRow(). And there we already check for this.bloomFilter == null first. Then you could add the check for "scan.isGetScan". And then the rest of the function. Kannan
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-06 13:57:07.200883)

          Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Summary (updated)
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          Internal review done by Jonathan and Kannan.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs


          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-06 13:57:07.200883) Review request for hbase, Nicolas, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan. Summary (updated) ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. Internal review done by Jonathan and Kannan. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review305
          -----------------------------------------------------------

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          <http://review.hbase.org/r/257/#comment1356>

          Yes, you are right that it is safe only for a Get. If you see line 159 (just one line above) there was a condition isGet before i.e. we skip a file if and only of the scan if of type Get. I have now moved this condition to shouldSeek() (line 916 of StoreFile.java)

          • Pranav
          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review305 ----------------------------------------------------------- trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java < http://review.hbase.org/r/257/#comment1356 > Yes, you are right that it is safe only for a Get. If you see line 159 (just one line above) there was a condition isGet before i.e. we skip a file if and only of the scan if of type Get. I have now moved this condition to shouldSeek() (line 916 of StoreFile.java) Pranav
          Hide
          HBase Review Board added a comment -

          Message from: "Kannan Muthukkaruppan" <kannan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/#review302
          -----------------------------------------------------------

          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
          <http://review.hbase.org/r/257/#comment1351>

          I am not clear on this scan.getStartRow() stuff. Note: This is not new with your change. You have only moved the getStartRow() call to instead shouldSeek().

          Question applies more to the bloom filter logic I think. It doesn't seem safe to make a decision to seek into a file or not based just on the startRow(). This only seems safe for a Get (which is implemented as a scan with the same row being the start & stop row).

          • Kannan
          Show
          HBase Review Board added a comment - Message from: "Kannan Muthukkaruppan" <kannan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/#review302 ----------------------------------------------------------- trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java < http://review.hbase.org/r/257/#comment1351 > I am not clear on this scan.getStartRow() stuff. Note: This is not new with your change. You have only moved the getStartRow() call to instead shouldSeek(). Question applies more to the bloom filter logic I think. It doesn't seem safe to make a decision to seek into a file or not based just on the startRow(). This only seems safe for a Get (which is implemented as a scan with the same row being the start & stop row). Kannan
          Hide
          HBase Review Board added a comment -

          Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.hbase.org/r/257/
          -----------------------------------------------------------

          (Updated 2010-07-05 12:04:54.536554)

          Review request for hbase, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan.

          Summary
          -------

          Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time.

          Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped.

          Addresses HBASE-2265 JIRA.

          This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero.

          This addresses bug HBASE-2265.
          http://issues.apache.org/jira/browse/HBASE-2265

          Diffs


          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782
          trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782
          trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782

          Diff: http://review.hbase.org/r/257/diff

          Testing
          -------

          All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps.

          Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps.

          Thanks,

          Pranav

          Show
          HBase Review Board added a comment - Message from: "Pranav Khaitan" <pranavkhaitan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/257/ ----------------------------------------------------------- (Updated 2010-07-05 12:04:54.536554) Review request for hbase, Jonathan Gray, Karthik Ranganathan, and Kannan Muthukkaruppan. Summary ------- Every memstore and store file will have a minimum and maximum timestamp associated with it. If the range of timestamps we are searching for doesn't overlap with the range for a particular file, we can skip searching it and save time. Would significantly improve the performance for timestamp range queries. Particularly useful when most of the reads are for recent entries and the older files can be safely skipped. Addresses HBASE-2265 JIRA. This diff includes fixing some minor bugs like KeyValueHeap used to throw an uncaught exception when size of scanner set was zero. This addresses bug HBASE-2265 . http://issues.apache.org/jira/browse/HBASE-2265 Diffs trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueSkipListSet.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 959782 trunk/src/main/java/org/apache/hadoop/hbase/regionserver/TimeRangeTracker.java PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStore.java 960082 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 959782 trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 959782 Diff: http://review.hbase.org/r/257/diff Testing ------- All existing JUnit tests run successfully. More JUnit tests for Memstore, StoreFile and Store added to test correctness with multiple timestamps. Conducted a test to measure the extra time required to keep track of min and max timestamps while writing KeyValues. The comparison was done by entering 1 Million KeyValues into memstore ten times with and without timestamp tracking and then taking the average time for each of them. WAL was disabled and no flushing was done during this test to minimize overheads. The average time taken for entering 1M KeyValues into memstore without keeping track of timestamp was 13.44 seconds while the average time when keeping track of timestamps was 13.45 seconds. This shows that no significant overhead has been added while keeping track of timestamps. Thanks, Pranav
          Hide
          Pranav Khaitan added a comment -

          Resolved!

          Diff submitted for review at http://review.hbase.org/r/257/

          Show
          Pranav Khaitan added a comment - Resolved! Diff submitted for review at http://review.hbase.org/r/257/
          Hide
          Jonathan Gray added a comment -

          Assigning to Pranav who is going to look into this.

          Show
          Jonathan Gray added a comment - Assigning to Pranav who is going to look into this.
          Hide
          Kannan Muthukkaruppan added a comment -

          Todd: Jonathan and I discussed essentially the same scheme offline. Basically, the extra meta data per HFile serves as a hint to optimize the common case of timestamps arriving in order, without sacrificing correctness if they did come out of order.

          You wrote <<< It may actually be sufficient to just store the max timestamp and not the min. I haven't really thought of a great use for min. >>>. For queries of the form "get all versions >T" max timestamp would suffice. But for queries that look up a specific timestamp, or columns in a time range T1..T2, having both the min and max timestamps would be better.

          Show
          Kannan Muthukkaruppan added a comment - Todd: Jonathan and I discussed essentially the same scheme offline. Basically, the extra meta data per HFile serves as a hint to optimize the common case of timestamps arriving in order, without sacrificing correctness if they did come out of order. You wrote <<< It may actually be sufficient to just store the max timestamp and not the min. I haven't really thought of a great use for min. >>>. For queries of the form "get all versions >T" max timestamp would suffice. But for queries that look up a specific timestamp, or columns in a time range T1..T2, having both the min and max timestamps would be better.
          Hide
          Jonathan Gray added a comment -

          Any query that specifies a TimeRange can potentially take advantage of this by only opening files which overlap with this min/max.

          I'm not sure if we're all talking about the same thing or not but I don't think it makes sense to use these (min/max versions) stamps for breaking the dupe-version tie. I think the stamp of the flush should be used (most recently flushed file is latest). Otherwise if people do wonky things with stamps, a random KV in a completely different row could impact which shows up as latest.

          Show
          Jonathan Gray added a comment - Any query that specifies a TimeRange can potentially take advantage of this by only opening files which overlap with this min/max. I'm not sure if we're all talking about the same thing or not but I don't think it makes sense to use these (min/max versions) stamps for breaking the dupe-version tie. I think the stamp of the flush should be used (most recently flushed file is latest). Otherwise if people do wonky things with stamps, a random KV in a completely different row could impact which shows up as latest.
          Hide
          Todd Lipcon added a comment -

          With a big spread of timestamps and keys, we wouldnt get much of an optimization

          Exactly. If users are writing out of order, they cannot take advantage of the optimization of culling older storage. As you mentioned, bloom filters help here. For users who are writing in order, the performance should be identical today. I think this is exactly what we want.

          for a complete column family get, we'll have to touch every file, every time. This is because you are never sure if the next file contains another key/value for the result. A bloom filter would help here

          Yep, and this is exactly what I would expect. Why should a column family get not touch all of the files?

          However, during a compaction, this information is collapsed, and we end up with the duplicate key/values sitting next to each other. We might be able to cause/create an invariant that during compaction the 'newer' one comes first

          It's probably worth getting consensus, but I think it would be acceptable behavior to only retain the keyval from the newest storage when the timestamps are equal. That is, if I write A:ts=1, B:ts=2, C:ts=3, D:ts=3, E:ts=3, and want to retain "latest 3", I'd end up getting writes A, B, and E.

          Generally the ideal solution would involve no change to the KeyValue serialization format

          I agree, and I think this can be done using only the existing metadata fields without any change per-keyvalue.

          Show
          Todd Lipcon added a comment - With a big spread of timestamps and keys, we wouldnt get much of an optimization Exactly. If users are writing out of order, they cannot take advantage of the optimization of culling older storage. As you mentioned, bloom filters help here. For users who are writing in order, the performance should be identical today. I think this is exactly what we want. for a complete column family get, we'll have to touch every file, every time. This is because you are never sure if the next file contains another key/value for the result. A bloom filter would help here Yep, and this is exactly what I would expect. Why should a column family get not touch all of the files? However, during a compaction, this information is collapsed, and we end up with the duplicate key/values sitting next to each other. We might be able to cause/create an invariant that during compaction the 'newer' one comes first It's probably worth getting consensus, but I think it would be acceptable behavior to only retain the keyval from the newest storage when the timestamps are equal. That is, if I write A:ts=1, B:ts=2, C:ts=3, D:ts=3, E:ts=3, and want to retain "latest 3", I'd end up getting writes A, B, and E. Generally the ideal solution would involve no change to the KeyValue serialization format I agree, and I think this can be done using only the existing metadata fields without any change per-keyvalue.
          Hide
          ryan rawson added a comment -

          I'm not sure this will help make gets better, there are 2 get cases:

          • get a single column for a row. In this case, if timestamps are written out of order, we dont know which hfile to start with. Lets say we start with the 'newest' one, and it has TS[1], well is the fact that an older file start < TS[1] < end mean we should consult this file? I suppose if end < TS[1] (thus the timestamp gotten is newer than the keyvalue we already got), we'd know there is nothing newer and we could conclusively rule that file out. If TS[1] was < beginning of a file, we'd have to consider the file. With a big spread of timestamps and keys, we wouldnt get much of an optimization.
          • for a complete column family get, we'll have to touch every file, every time. This is because you are never sure if the next file contains another key/value for the result. A bloom filter would help here.

          As for the scan, we already know which files are 'newer'. However, during a compaction, this information is collapsed, and we end up with the duplicate key/values sitting next to each other. We might be able to cause/create an invariant that during compaction the 'newer' one comes first. The compaction might be able to help straighten this out, since i think we do minor compactions 'in order', with older files first. Seems like a tricky bit.

          Generally the ideal solution would involve no change to the KeyValue serialization format (and hence possibly requiring a store-file rewrite).

          Show
          ryan rawson added a comment - I'm not sure this will help make gets better, there are 2 get cases: get a single column for a row. In this case, if timestamps are written out of order, we dont know which hfile to start with. Lets say we start with the 'newest' one, and it has TS [1] , well is the fact that an older file start < TS [1] < end mean we should consult this file? I suppose if end < TS [1] (thus the timestamp gotten is newer than the keyvalue we already got), we'd know there is nothing newer and we could conclusively rule that file out. If TS [1] was < beginning of a file, we'd have to consider the file. With a big spread of timestamps and keys, we wouldnt get much of an optimization. for a complete column family get, we'll have to touch every file, every time. This is because you are never sure if the next file contains another key/value for the result. A bloom filter would help here. As for the scan, we already know which files are 'newer'. However, during a compaction, this information is collapsed, and we end up with the duplicate key/values sitting next to each other. We might be able to cause/create an invariant that during compaction the 'newer' one comes first. The compaction might be able to help straighten this out, since i think we do minor compactions 'in order', with older files first. Seems like a tricky bit. Generally the ideal solution would involve no change to the KeyValue serialization format (and hence possibly requiring a store-file rewrite).
          Hide
          Todd Lipcon added a comment -

          It may actually be sufficient to just store the max timestamp and not the min. I haven't really thought of a great use for min.

          Show
          Todd Lipcon added a comment - It may actually be sufficient to just store the max timestamp and not the min. I haven't really thought of a great use for min.

            People

            • Assignee:
              Pranav Khaitan
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development