HBase
  1. HBase
  2. HBASE-2794

Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.92.0
    • Component/s: Performance
    • Labels:
      None
    • Release Note:
      Resolving (the patch has been committed).

      Description

      Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():

              switch(bloomFilterType) {
                case ROW:
                  key = row;
                  break;
                case ROWCOL:
                  if (columns.size() == 1) {
                    byte[] col = columns.first();
                    key = Bytes.add(row, col);
                    break;
                  }
                  //$FALL-THROUGH$
                default:
                  return true;
              }
      

      If columns.size > 1, then we currently don't take advantage of the bloom filter. We should optimize this to check bloom for each of columns and if none of the columns are present in the bloom avoid opening the file.

        Activity

        Hide
        Kannan Muthukkaruppan added a comment -

        Perhaps a simple starter task for someone interested.

        Show
        Kannan Muthukkaruppan added a comment - Perhaps a simple starter task for someone interested.
        Hide
        Kris Jirapinyo added a comment -

        First stab at it. Comments welcome.

        Show
        Kris Jirapinyo added a comment - First stab at it. Comments welcome.
        Hide
        ryan rawson added a comment -

        can you also upload it to review.hbase.org for easy reviewing, thanks

        Show
        ryan rawson added a comment - can you also upload it to review.hbase.org for easy reviewing, thanks
        Hide
        HBase Review Board added a comment -

        Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/
        -----------------------------------------------------------

        Review request for hbase.

        Summary
        -------

        HBASE-2794 Enable bloom filter checks for multiple columns in same column family

        This addresses bug HBASE-2794.
        http://issues.apache.org/jira/browse/HBASE-2794

        Diffs


        /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 962748
        /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 962748

        Diff: http://review.hbase.org/r/296/diff

        Testing
        -------

        Ran and passed org.apache.hadoop.hbase.regionserver.TestStoreFile multiple times. Ran and passed all tests when building.

        Thanks,

        Kris

        Show
        HBase Review Board added a comment - Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/ ----------------------------------------------------------- Review request for hbase. Summary ------- HBASE-2794 Enable bloom filter checks for multiple columns in same column family This addresses bug HBASE-2794 . http://issues.apache.org/jira/browse/HBASE-2794 Diffs /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 962748 /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 962748 Diff: http://review.hbase.org/r/296/diff Testing ------- Ran and passed org.apache.hadoop.hbase.regionserver.TestStoreFile multiple times. Ran and passed all tests when building. Thanks, Kris
        Hide
        Kris Jirapinyo added a comment -

        Submitted to review.hbase.org. So the process is actually to get reviewed there before uploading the patch here?

        Show
        Kris Jirapinyo added a comment - Submitted to review.hbase.org. So the process is actually to get reviewed there before uploading the patch here?
        Hide
        HBase Review Board added a comment -

        Message from: "Nicolas" <nspiegelberg@facebook.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review350
        -----------------------------------------------------------

        /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        <http://review.hbase.org/r/296/#comment1468>

        have you done any tests to see when the number of bloom checks takes significant time compared to just getting the block? For example, if you have 100 columns to lookup, do bloom filters really buy you anything, or shouldn't you just switch to a Row-level bloom anyways? Also, with a default 1% error rate, you're looking at ~100% false positive with 100 columns. Maybe max.columns = sqrt(1/error.rate)

        /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        <http://review.hbase.org/r/296/#comment1463>

        probably should pre-allocate the ArrayList() size so we only deal with one heap element.

        • Nicolas
        Show
        HBase Review Board added a comment - Message from: "Nicolas" <nspiegelberg@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review350 ----------------------------------------------------------- /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < http://review.hbase.org/r/296/#comment1468 > have you done any tests to see when the number of bloom checks takes significant time compared to just getting the block? For example, if you have 100 columns to lookup, do bloom filters really buy you anything, or shouldn't you just switch to a Row-level bloom anyways? Also, with a default 1% error rate, you're looking at ~100% false positive with 100 columns. Maybe max.columns = sqrt(1/error.rate) /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < http://review.hbase.org/r/296/#comment1463 > probably should pre-allocate the ArrayList() size so we only deal with one heap element. Nicolas
        Hide
        HBase Review Board added a comment -

        Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com>

        On 2010-07-12 10:17:25, Nicolas wrote:

        > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 860

        > <http://review.hbase.org/r/296/diff/1/?file=2378#file2378line860>

        >

        > probably should pre-allocate the ArrayList() size so we only deal with one heap element.

        Good idea.

        On 2010-07-12 10:17:25, Nicolas wrote:

        > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 857

        > <http://review.hbase.org/r/296/diff/1/?file=2378#file2378line857>

        >

        > have you done any tests to see when the number of bloom checks takes significant time compared to just getting the block? For example, if you have 100 columns to lookup, do bloom filters really buy you anything, or shouldn't you just switch to a Row-level bloom anyways? Also, with a default 1% error rate, you're looking at ~100% false positive with 100 columns. Maybe max.columns = sqrt(1/error.rate)

        I have not, but would running on just the test data be sufficent to tell the true savings since the tests just run on mock data? I don't really have a dev cluster with real data that I can test this on, so perhaps you or someone could help out in that regard.

        • Kris

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review350
        -----------------------------------------------------------

        Show
        HBase Review Board added a comment - Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com> On 2010-07-12 10:17:25, Nicolas wrote: > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 860 > < http://review.hbase.org/r/296/diff/1/?file=2378#file2378line860 > > > probably should pre-allocate the ArrayList() size so we only deal with one heap element. Good idea. On 2010-07-12 10:17:25, Nicolas wrote: > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 857 > < http://review.hbase.org/r/296/diff/1/?file=2378#file2378line857 > > > have you done any tests to see when the number of bloom checks takes significant time compared to just getting the block? For example, if you have 100 columns to lookup, do bloom filters really buy you anything, or shouldn't you just switch to a Row-level bloom anyways? Also, with a default 1% error rate, you're looking at ~100% false positive with 100 columns. Maybe max.columns = sqrt(1/error.rate) I have not, but would running on just the test data be sufficent to tell the true savings since the tests just run on mock data? I don't really have a dev cluster with real data that I can test this on, so perhaps you or someone could help out in that regard. Kris ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review350 -----------------------------------------------------------
        Hide
        HBase Review Board added a comment -

        Message from: "Nicolas" <nspiegelberg@facebook.com>

        On 2010-07-12 10:17:25, Nicolas wrote:

        > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 857

        > <http://review.hbase.org/r/296/diff/1/?file=2378#file2378line857>

        >

        > have you done any tests to see when the number of bloom checks takes significant time compared to just getting the block? For example, if you have 100 columns to lookup, do bloom filters really buy you anything, or shouldn't you just switch to a Row-level bloom anyways? Also, with a default 1% error rate, you're looking at ~100% false positive with 100 columns. Maybe max.columns = sqrt(1/error.rate)

        Kris Jirapinyo wrote:

        I have not, but would running on just the test data be sufficent to tell the true savings since the tests just run on mock data? I don't really have a dev cluster with real data that I can test this on, so perhaps you or someone could help out in that regard.

        BTW: Thanks for the work. I don't think running on test data would be sufficient because you want to compare the speed of accessing a large bloom filter (which should have random access, aka L1 cache misses) with the cost of getting an HFile block from disk (with OS block cache miss). If you can't setup a large cluster, one strategy might be to use 10ms as a the disk seek baseline and use testBloomPerf() in TestByteBloomFilter.java to estimate BloomFilter latency. Ryan Rawson did some tests on using blooms with small KV entries. He might be able to give you some numbers on when blooms do not take up too much memory (hopefully, some number like LV.length > 1KB). You can then use the fact that HFiles are ~64MB to estimate a good entry sample size (I just picked 10M entries in current testBloomPerf() from thin air as a big number). Sounds a little complicated at first, but this strategy would probably take less time [and be more interesting] than trying to setup a genuine cluster.

        • Nicolas

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review350
        -----------------------------------------------------------

        Show
        HBase Review Board added a comment - Message from: "Nicolas" <nspiegelberg@facebook.com> On 2010-07-12 10:17:25, Nicolas wrote: > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 857 > < http://review.hbase.org/r/296/diff/1/?file=2378#file2378line857 > > > have you done any tests to see when the number of bloom checks takes significant time compared to just getting the block? For example, if you have 100 columns to lookup, do bloom filters really buy you anything, or shouldn't you just switch to a Row-level bloom anyways? Also, with a default 1% error rate, you're looking at ~100% false positive with 100 columns. Maybe max.columns = sqrt(1/error.rate) Kris Jirapinyo wrote: I have not, but would running on just the test data be sufficent to tell the true savings since the tests just run on mock data? I don't really have a dev cluster with real data that I can test this on, so perhaps you or someone could help out in that regard. BTW: Thanks for the work. I don't think running on test data would be sufficient because you want to compare the speed of accessing a large bloom filter (which should have random access, aka L1 cache misses) with the cost of getting an HFile block from disk (with OS block cache miss). If you can't setup a large cluster, one strategy might be to use 10ms as a the disk seek baseline and use testBloomPerf() in TestByteBloomFilter.java to estimate BloomFilter latency. Ryan Rawson did some tests on using blooms with small KV entries. He might be able to give you some numbers on when blooms do not take up too much memory (hopefully, some number like LV.length > 1KB). You can then use the fact that HFiles are ~64MB to estimate a good entry sample size (I just picked 10M entries in current testBloomPerf() from thin air as a big number). Sounds a little complicated at first, but this strategy would probably take less time [and be more interesting] than trying to setup a genuine cluster. Nicolas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review350 -----------------------------------------------------------
        Hide
        HBase Review Board added a comment -

        Message from: "Kannan Muthukkaruppan" <kannan@facebook.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review361
        -----------------------------------------------------------

        /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        <http://review.hbase.org/r/296/#comment1497>

        can't this loop be over "columns" itself? And then inside the loop, you prepare one key at a time use Bytes.add(row, col). That way, you can avoid the keyList data structure completely.

        • Kannan
        Show
        HBase Review Board added a comment - Message from: "Kannan Muthukkaruppan" <kannan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review361 ----------------------------------------------------------- /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < http://review.hbase.org/r/296/#comment1497 > can't this loop be over "columns" itself? And then inside the loop, you prepare one key at a time use Bytes.add(row, col). That way, you can avoid the keyList data structure completely. Kannan
        Hide
        HBase Review Board added a comment -

        Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com>

        On 2010-07-12 13:14:32, Kannan Muthukkaruppan wrote:

        > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 880

        > <http://review.hbase.org/r/296/diff/1/?file=2378#file2378line880>

        >

        > can't this loop be over "columns" itself? And then inside the loop, you prepare one key at a time use Bytes.add(row, col). That way, you can avoid the keyList data structure completely.

        Another good idea Will also get rid of the warning that keyList could possibly be null.

        • Kris

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review361
        -----------------------------------------------------------

        Show
        HBase Review Board added a comment - Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com> On 2010-07-12 13:14:32, Kannan Muthukkaruppan wrote: > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 880 > < http://review.hbase.org/r/296/diff/1/?file=2378#file2378line880 > > > can't this loop be over "columns" itself? And then inside the loop, you prepare one key at a time use Bytes.add(row, col). That way, you can avoid the keyList data structure completely. Another good idea Will also get rid of the warning that keyList could possibly be null. Kris ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review361 -----------------------------------------------------------
        Hide
        Kris Jirapinyo added a comment -

        I removed the patch I uploaded. Will upload the final version when it's approved from review.hbase.org.

        Show
        Kris Jirapinyo added a comment - I removed the patch I uploaded. Will upload the final version when it's approved from review.hbase.org.
        Hide
        ryan rawson added a comment -

        Consider a table with 12 billion rows. At 9 bits/row, we are looking
        at 13500000000 bytes of ram (base) to store the blooms in ram. That is
        12.57 GB ram to store the blooms. The memory competes with the block
        cache, thus you are losing 12.57 GB ram that could be used to cache
        blocks. If your data is in block cache, seeking is free, thus there
        is an essential trade off here.

        In my case, the 12b rows are small ones, and thus we have a lot of
        rows for the actual data size. On a different dataset, the row count
        might be smaller for a the actual data size and it might be
        worthwhile. Furthermore, blooms don't work on Scans and only Gets.

        The key takeaway here is that (a) bloom filters are not free and
        potentially very expensive in terms of RAM, (b) bloom data competes
        with the block cache, and (c) the trade off depends on the data set
        and access patterns.

        On Mon, Jul 12, 2010 at 12:07 PM, HBase Review Board (JIRA)

        Show
        ryan rawson added a comment - Consider a table with 12 billion rows. At 9 bits/row, we are looking at 13500000000 bytes of ram (base) to store the blooms in ram. That is 12.57 GB ram to store the blooms. The memory competes with the block cache, thus you are losing 12.57 GB ram that could be used to cache blocks. If your data is in block cache, seeking is free, thus there is an essential trade off here. In my case, the 12b rows are small ones, and thus we have a lot of rows for the actual data size. On a different dataset, the row count might be smaller for a the actual data size and it might be worthwhile. Furthermore, blooms don't work on Scans and only Gets. The key takeaway here is that (a) bloom filters are not free and potentially very expensive in terms of RAM, (b) bloom data competes with the block cache, and (c) the trade off depends on the data set and access patterns. On Mon, Jul 12, 2010 at 12:07 PM, HBase Review Board (JIRA)
        Hide
        Nicolas Spiegelberg added a comment -

        IRC conversation about this...

        krispyjala: nspiegelberg: but is the test you want related to HBASE-2794 or just using bloom filter in general (e.g. when to use it and when not to)?
        [1:41pm] nspiegelberg: it's related to 2794
        [1:42pm] nspiegelberg: an easy example of why you need good measurements is the case of calling bloom.contains() for 100 row+col in a 1% false positive bloom. You are getting almost 100% false positives then, so the bloom is an obvious perf drop
        [1:43pm] krispyjala: nspiegelberg: ok i think i understand
        [1:44pm] krispyjala: nspiegelberg: but wait 100% false positive?
        [1:46pm] nspiegelberg: right, so io.hfile.bloom.error.rate == .01 by default. so 1%
        [1:46pm] krispyjala: ok
        [1:46pm] krispyjala: how does that add up to 100% for 100 lookups?
        [1:46pm] nspiegelberg: therefore, if you call bloom.contains() 5 times and OR the result, the false positive rate is 5%
        [1:49pm] nspiegelberg: krispyjala: so a simple example. call bloom.contains() 10 times = 10% error rate = (10ms/seek * 10%) + time(bloom.contains)
        [1:50pm] krispyjala: nspiegelberg: but is it really OR'ing all of them? In the code if even one column lookup returns true we return true and don't look up any other columns
        [1:51pm] nspiegelberg: right, that's the same thing as ORing them
        [1:51pm] nspiegelberg: logical OR => ||
        [1:52pm] krispyjala: nspiegelberg: but the point is we're probably not looking up 100 columns every time for that operation, even theoretically yes we do a logical OR
        [1:52pm] krispyjala: if we hit true on the 5th column, we quit the loop and return right away
        [1:53pm] nspiegelberg: the only way you win with blooms is if all bloom.contains() return false and you don't have to do the lookup
        [1:53pm] krispyjala: yes
        [1:53pm] nspiegelberg: so, you're right, we do an average of 50 lookups per false positive in this case.
        [1:54pm] nspiegelberg: I'm just saying, what is the cost of those 50 lookups? If 1ms, then every HFile seek costs 11ms with blooms enabled versus 10 ms without using them
        [1:55pm] krispyjala: but wait i thought the code was to determine whether to add the StoreScanner to the list or not...or are you saying then that the point is in the case of 100 columns we should just not even bother doing bloom multicolumn check because perhaps it's better to just load it than wasting time with the 100 lookups (potentially)
        [1:55pm] nspiegelberg: exactly
        [1:55pm] krispyjala: nspiegelberg: lol ok got it
        [1:56pm] krispyjala: but realistically, who does gets on 100 columns? I don't know the HBase internals well yet (that's why i picked the noob ticket lol)...wouldn't it be better to just do a get on the row?
        [1:57pm] nspiegelberg: never under-estimate the naivete of users
        [1:57pm] krispyjala: nspiegelberg: sigh lol, i guess that's why the bloom is off by default?
        [1:58pm] nspiegelberg: yes
        [1:58pm] nspiegelberg: so, it's obvious that you never want to run bloom code with 101 columns + 1% error rate
        [1:58pm] krispyjala: correct
        [1:59pm] nspiegelberg: so, really it's just timing testBloomPerf with various lookup counts on various size blooms
        [2:00pm] krispyjala: nspiegelberg: this talk has helped me think about how to test like you said
        [2:00pm] • St^Ack hopes the above good-stuff(tm) 'lesson' makes it back into the issue....
        [2:00pm] nspiegelberg: looks like ryan didn't give you any concrete numbers, so you might have to just start with some assumptions (like, don't use blooms if avg key < 1KB) and run with that
        [2:01pm] krispyjala: nspiegelberg: and perhaps once we kind of know where the tradeoff is, would it be wrong to limit in the code saying if there are more than say 10 column lookups might as well just return true?
        [2:01pm] krispyjala: cuz it's not worth looking up in bloom at that point
        [2:01pm] nspiegelberg: I think that's exactly what we need to do
        [2:01pm] krispyjala: whatever the threshold is
        [2:02pm] nspiegelberg: if we pretend that the cost of bloom.contains() == 0, then maybe we want to say if (column.count * error.rate > 10%) return true;
        [2:02pm] dj_ryan: well it's hard to say where the tradeoff goes
        [2:02pm] krispyjala: pastebin? lol jk
        [2:02pm] dj_ryan: but the hard number is 9 bits/item
        [2:03pm] dj_ryan: you can then calculate how much ram you are spending on blooms
        [2:03pm] dj_ryan: and decide if its worth it
        [2:03pm] nspiegelberg: the hard # for 1% error rate blooms is 9 bits/item
        [2:03pm] dj_ryan: we never implemented blooms because it seemed 12gb of ram would be better off caching
        [2:03pm] krispyjala: dj_ryan: so your suggestion the onus is on the user and not hbase code
        [2:03pm] nspiegelberg: with .1% error rate, it's ~12 bits/item
        [2:04pm] krispyjala: or should we allow customizations of the limits? then we don't need to come up with the "recommended" threshold
        [2:05pm] dj_ryan: well
        [2:05pm] dj_ryan: it is up to the user
        [2:05pm] nspiegelberg: I think the onus for figuring out whether to use blooms or not is on the user, but we should still have a 'this is too stupid' early exit
        [2:05pm] dj_ryan: i mean maybe we could put a lot of metrics to detect when a bloom filter might be useful
        [2:05pm] dj_ryan: but im not sure that's worth it
        [2:06pm] krispyjala: dj_ryan: yes, but right now they can either just turn it on or off, and with my patch they will be forced to look up all the columns if they have more than one
        [2:07pm] nspiegelberg: krispyjala: I think just running testBloomPerf on a couple different sizes will give you a goo timing measurement. Unless my initial thoughts are off, you can probably just get away with saying: (column.count * error.rate > 10%) return true;
        [2:07pm] krispyjala: nspiegelberg: yeah I agree we should have the early exit strat
        [2:08pm] krispyjala: nspiegelberg: ok i will do some testing this evening on it
        [2:08pm] nspiegelberg: then, when somebody asks why you chose 10%, you can say that it obviously makes sense when below 10% and they should run some numbers for you if they want to pump it up

        Show
        Nicolas Spiegelberg added a comment - IRC conversation about this... krispyjala: nspiegelberg: but is the test you want related to HBASE-2794 or just using bloom filter in general (e.g. when to use it and when not to)? [1:41pm] nspiegelberg: it's related to 2794 [1:42pm] nspiegelberg: an easy example of why you need good measurements is the case of calling bloom.contains() for 100 row+col in a 1% false positive bloom. You are getting almost 100% false positives then, so the bloom is an obvious perf drop [1:43pm] krispyjala: nspiegelberg: ok i think i understand [1:44pm] krispyjala: nspiegelberg: but wait 100% false positive? [1:46pm] nspiegelberg: right, so io.hfile.bloom.error.rate == .01 by default. so 1% [1:46pm] krispyjala: ok [1:46pm] krispyjala: how does that add up to 100% for 100 lookups? [1:46pm] nspiegelberg: therefore, if you call bloom.contains() 5 times and OR the result, the false positive rate is 5% [1:49pm] nspiegelberg: krispyjala: so a simple example. call bloom.contains() 10 times = 10% error rate = (10ms/seek * 10%) + time(bloom.contains) [1:50pm] krispyjala: nspiegelberg: but is it really OR'ing all of them? In the code if even one column lookup returns true we return true and don't look up any other columns [1:51pm] nspiegelberg: right, that's the same thing as ORing them [1:51pm] nspiegelberg: logical OR => || [1:52pm] krispyjala: nspiegelberg: but the point is we're probably not looking up 100 columns every time for that operation, even theoretically yes we do a logical OR [1:52pm] krispyjala: if we hit true on the 5th column, we quit the loop and return right away [1:53pm] nspiegelberg: the only way you win with blooms is if all bloom.contains() return false and you don't have to do the lookup [1:53pm] krispyjala: yes [1:53pm] nspiegelberg: so, you're right, we do an average of 50 lookups per false positive in this case. [1:54pm] nspiegelberg: I'm just saying, what is the cost of those 50 lookups? If 1ms, then every HFile seek costs 11ms with blooms enabled versus 10 ms without using them [1:55pm] krispyjala: but wait i thought the code was to determine whether to add the StoreScanner to the list or not...or are you saying then that the point is in the case of 100 columns we should just not even bother doing bloom multicolumn check because perhaps it's better to just load it than wasting time with the 100 lookups (potentially) [1:55pm] nspiegelberg: exactly [1:55pm] krispyjala: nspiegelberg: lol ok got it [1:56pm] krispyjala: but realistically, who does gets on 100 columns? I don't know the HBase internals well yet (that's why i picked the noob ticket lol)...wouldn't it be better to just do a get on the row? [1:57pm] nspiegelberg: never under-estimate the naivete of users [1:57pm] krispyjala: nspiegelberg: sigh lol, i guess that's why the bloom is off by default? [1:58pm] nspiegelberg: yes [1:58pm] nspiegelberg: so, it's obvious that you never want to run bloom code with 101 columns + 1% error rate [1:58pm] krispyjala: correct [1:59pm] nspiegelberg: so, really it's just timing testBloomPerf with various lookup counts on various size blooms [2:00pm] krispyjala: nspiegelberg: this talk has helped me think about how to test like you said [2:00pm] • St^Ack hopes the above good-stuff(tm) 'lesson' makes it back into the issue.... [2:00pm] nspiegelberg: looks like ryan didn't give you any concrete numbers, so you might have to just start with some assumptions (like, don't use blooms if avg key < 1KB) and run with that [2:01pm] krispyjala: nspiegelberg: and perhaps once we kind of know where the tradeoff is, would it be wrong to limit in the code saying if there are more than say 10 column lookups might as well just return true? [2:01pm] krispyjala: cuz it's not worth looking up in bloom at that point [2:01pm] nspiegelberg: I think that's exactly what we need to do [2:01pm] krispyjala: whatever the threshold is [2:02pm] nspiegelberg: if we pretend that the cost of bloom.contains() == 0, then maybe we want to say if (column.count * error.rate > 10%) return true; [2:02pm] dj_ryan: well it's hard to say where the tradeoff goes [2:02pm] krispyjala: pastebin? lol jk [2:02pm] dj_ryan: but the hard number is 9 bits/item [2:03pm] dj_ryan: you can then calculate how much ram you are spending on blooms [2:03pm] dj_ryan: and decide if its worth it [2:03pm] nspiegelberg: the hard # for 1% error rate blooms is 9 bits/item [2:03pm] dj_ryan: we never implemented blooms because it seemed 12gb of ram would be better off caching [2:03pm] krispyjala: dj_ryan: so your suggestion the onus is on the user and not hbase code [2:03pm] nspiegelberg: with .1% error rate, it's ~12 bits/item [2:04pm] krispyjala: or should we allow customizations of the limits? then we don't need to come up with the "recommended" threshold [2:05pm] dj_ryan: well [2:05pm] dj_ryan: it is up to the user [2:05pm] nspiegelberg: I think the onus for figuring out whether to use blooms or not is on the user, but we should still have a 'this is too stupid' early exit [2:05pm] dj_ryan: i mean maybe we could put a lot of metrics to detect when a bloom filter might be useful [2:05pm] dj_ryan: but im not sure that's worth it [2:06pm] krispyjala: dj_ryan: yes, but right now they can either just turn it on or off, and with my patch they will be forced to look up all the columns if they have more than one [2:07pm] nspiegelberg: krispyjala: I think just running testBloomPerf on a couple different sizes will give you a goo timing measurement. Unless my initial thoughts are off, you can probably just get away with saying: (column.count * error.rate > 10%) return true; [2:07pm] krispyjala: nspiegelberg: yeah I agree we should have the early exit strat [2:08pm] krispyjala: nspiegelberg: ok i will do some testing this evening on it [2:08pm] nspiegelberg: then, when somebody asks why you chose 10%, you can say that it obviously makes sense when below 10% and they should run some numbers for you if they want to pump it up
        Hide
        HBase Review Board added a comment -

        Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/
        -----------------------------------------------------------

        (Updated 2010-07-12 19:48:43.373418)

        Review request for hbase.

        Changes
        -------

        Implemented Kannan's suggestion, thereby removing keyList.

        Summary
        -------

        HBASE-2794 Enable bloom filter checks for multiple columns in same column family

        This addresses bug HBASE-2794.
        http://issues.apache.org/jira/browse/HBASE-2794

        Diffs (updated)


        /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 962748
        /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 962748

        Diff: http://review.hbase.org/r/296/diff

        Testing
        -------

        Ran and passed org.apache.hadoop.hbase.regionserver.TestStoreFile multiple times. Ran and passed all tests when building.

        Thanks,

        Kris

        Show
        HBase Review Board added a comment - Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/ ----------------------------------------------------------- (Updated 2010-07-12 19:48:43.373418) Review request for hbase. Changes ------- Implemented Kannan's suggestion, thereby removing keyList. Summary ------- HBASE-2794 Enable bloom filter checks for multiple columns in same column family This addresses bug HBASE-2794 . http://issues.apache.org/jira/browse/HBASE-2794 Diffs (updated) /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 962748 /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 962748 Diff: http://review.hbase.org/r/296/diff Testing ------- Ran and passed org.apache.hadoop.hbase.regionserver.TestStoreFile multiple times. Ran and passed all tests when building. Thanks, Kris
        Hide
        HBase Review Board added a comment -

        Message from: "Kannan Muthukkaruppan" <kannan@facebook.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review384
        -----------------------------------------------------------

        One inlined comment. Otherwise, the patch and the test look good.

        /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
        <http://review.hbase.org/r/296/#comment1622>

        Once Pranav's patch for HBase-2265 lands, the shouldSeek() API will take a "Scan" as the first argument instead of the row. So, you might need to rebase the test with respect to that patch.

        • Kannan
        Show
        HBase Review Board added a comment - Message from: "Kannan Muthukkaruppan" <kannan@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review384 ----------------------------------------------------------- One inlined comment. Otherwise, the patch and the test look good. /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java < http://review.hbase.org/r/296/#comment1622 > Once Pranav's patch for HBase-2265 lands, the shouldSeek() API will take a "Scan" as the first argument instead of the row. So, you might need to rebase the test with respect to that patch. Kannan
        Hide
        HBase Review Board added a comment -

        Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/
        -----------------------------------------------------------

        (Updated 2010-07-13 16:32:18.729301)

        Review request for hbase.

        Changes
        -------

        Added changes to code after HBASE-2265 was committed.

        Also, incorporated suggestion from Nicolas to not lookup when columns.size*error.rate > 10%.

        Changed BloomFilter interface, adding getErrorRate(). ByteBloomFilter now also has errorRate stored.

        Summary
        -------

        HBASE-2794 Enable bloom filter checks for multiple columns in same column family

        This addresses bug HBASE-2794.
        http://issues.apache.org/jira/browse/HBASE-2794

        Diffs (updated)


        /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 963862
        /trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilter.java 963873
        /trunk/src/main/java/org/apache/hadoop/hbase/util/ByteBloomFilter.java 963873
        /trunk/src/main/java/org/apache/hadoop/hbase/util/DynamicByteBloomFilter.java 963873
        /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 963873

        Diff: http://review.hbase.org/r/296/diff

        Testing
        -------

        Ran and passed org.apache.hadoop.hbase.regionserver.TestStoreFile multiple times. Ran and passed all tests when building.

        Thanks,

        Kris

        Show
        HBase Review Board added a comment - Message from: "Kris Jirapinyo" <kjirapinyo@attensity.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/ ----------------------------------------------------------- (Updated 2010-07-13 16:32:18.729301) Review request for hbase. Changes ------- Added changes to code after HBASE-2265 was committed. Also, incorporated suggestion from Nicolas to not lookup when columns.size*error.rate > 10%. Changed BloomFilter interface, adding getErrorRate(). ByteBloomFilter now also has errorRate stored. Summary ------- HBASE-2794 Enable bloom filter checks for multiple columns in same column family This addresses bug HBASE-2794 . http://issues.apache.org/jira/browse/HBASE-2794 Diffs (updated) /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 963862 /trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilter.java 963873 /trunk/src/main/java/org/apache/hadoop/hbase/util/ByteBloomFilter.java 963873 /trunk/src/main/java/org/apache/hadoop/hbase/util/DynamicByteBloomFilter.java 963873 /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 963873 Diff: http://review.hbase.org/r/296/diff Testing ------- Ran and passed org.apache.hadoop.hbase.regionserver.TestStoreFile multiple times. Ran and passed all tests when building. Thanks, Kris
        Hide
        Nicolas Spiegelberg added a comment -

        Talked with Kris about setting proper exit conditions.

        #1 : Exit if our error.rate > 10%. This is an arbitrary number. Could easily make this configurable if someone needs it
        #2 : Exit if it would take > 1ms to run the bloom check. This ensures that blooms are beneficial for performance even if they aren't needed 90% of the time

        I wonder if it would be good to give the user an option of not running a bloom check if only 1 HFile in the StoreFile, but that's for another JIRA.

        Show
        Nicolas Spiegelberg added a comment - Talked with Kris about setting proper exit conditions. #1 : Exit if our error.rate > 10%. This is an arbitrary number. Could easily make this configurable if someone needs it #2 : Exit if it would take > 1ms to run the bloom check. This ensures that blooms are beneficial for performance even if they aren't needed 90% of the time I wonder if it would be good to give the user an option of not running a bloom check if only 1 HFile in the StoreFile, but that's for another JIRA.
        Hide
        HBase Review Board added a comment -

        Message from: "Nicolas" <nspiegelberg@facebook.com>

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review397
        -----------------------------------------------------------

        Looking good! Waiting for performance test numbers on StoreFile.shouldSeek(). I think we want to early exit if shouldSeek() would take > 1ms or something sensible.

        /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        <http://review.hbase.org/r/296/#comment1703>

        red = using tabs instead of spaces or trailing spaces. quick fix might be nice (or is this auto-handled by svn, Stack?)

        /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
        <http://review.hbase.org/r/296/#comment1702>

        could you add test header comments so we know all the cases you're trying to test?

        • Nicolas
        Show
        HBase Review Board added a comment - Message from: "Nicolas" <nspiegelberg@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review397 ----------------------------------------------------------- Looking good! Waiting for performance test numbers on StoreFile.shouldSeek(). I think we want to early exit if shouldSeek() would take > 1ms or something sensible. /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java < http://review.hbase.org/r/296/#comment1703 > red = using tabs instead of spaces or trailing spaces. quick fix might be nice (or is this auto-handled by svn, Stack?) /trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java < http://review.hbase.org/r/296/#comment1702 > could you add test header comments so we know all the cases you're trying to test? Nicolas
        Hide
        HBase Review Board added a comment -

        Message from: "Jonathan Gray" <jgray@apache.org>

        On 2010-07-13 18:09:13, Nicolas wrote:

        > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 958

        > <http://review.hbase.org/r/296/diff/3/?file=2723#file2723line958>

        >

        > red = using tabs instead of spaces or trailing spaces. quick fix might be nice (or is this auto-handled by svn, Stack?)

        none of this is auto-handled by svn. need to setup eclipse or whatever you use to use 2 spaces instead of tabs. and in eclipse, i have my code cleanup set to remove whitespace and run that periodically.

        • Jonathan

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        http://review.hbase.org/r/296/#review397
        -----------------------------------------------------------

        Show
        HBase Review Board added a comment - Message from: "Jonathan Gray" <jgray@apache.org> On 2010-07-13 18:09:13, Nicolas wrote: > /trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line 958 > < http://review.hbase.org/r/296/diff/3/?file=2723#file2723line958 > > > red = using tabs instead of spaces or trailing spaces. quick fix might be nice (or is this auto-handled by svn, Stack?) none of this is auto-handled by svn. need to setup eclipse or whatever you use to use 2 spaces instead of tabs. and in eclipse, i have my code cleanup set to remove whitespace and run that periodically. Jonathan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/296/#review397 -----------------------------------------------------------
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/
        -----------------------------------------------------------

        Review request for hbase.

        Summary
        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.
        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs


        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4
        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e
        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de
        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5
        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef
        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c
        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7
        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98
        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION
        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e
        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb
        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7
        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696
        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing
        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/#review2130
        -----------------------------------------------------------

        nice work mikhail! i will let someone else give the +1 though

        src/main/java/org/apache/hadoop/hbase/KeyValue.java
        <https://reviews.apache.org/r/2084/#comment4946>

        method doesn't actually take a KeyValue... this is to create the last KV the on row and column for the KeyValue this is called on?

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
        <https://reviews.apache.org/r/2084/#comment4947>

        got it. maybe add a comment on this method to explain this usage

        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
        <https://reviews.apache.org/r/2084/#comment4948>

        license

        • Jonathan

        On 2011-09-28 16:03:52, Mikhail Bautin wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2084/

        -----------------------------------------------------------

        (Updated 2011-09-28 16:03:52)

        Review request for hbase.

        Summary

        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.

        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e

        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de

        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7

        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98

        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION

        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8

        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4

        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e

        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb

        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7

        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696

        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing

        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2130 ----------------------------------------------------------- nice work mikhail! i will let someone else give the +1 though src/main/java/org/apache/hadoop/hbase/KeyValue.java < https://reviews.apache.org/r/2084/#comment4946 > method doesn't actually take a KeyValue... this is to create the last KV the on row and column for the KeyValue this is called on? src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java < https://reviews.apache.org/r/2084/#comment4947 > got it. maybe add a comment on this method to explain this usage src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java < https://reviews.apache.org/r/2084/#comment4948 > license Jonathan On 2011-09-28 16:03:52, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- (Updated 2011-09-28 16:03:52) Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/#review2137
        -----------------------------------------------------------

        This is an important feature.

        Since the boolean parameter, forward, correlates so closely with reseek, can we give it a better name ?
        I was thinking about either reseek or forwardOnly.

        • Ted

        On 2011-09-28 16:03:52, Mikhail Bautin wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2084/

        -----------------------------------------------------------

        (Updated 2011-09-28 16:03:52)

        Review request for hbase.

        Summary

        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.

        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e

        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de

        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7

        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98

        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION

        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8

        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4

        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e

        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb

        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7

        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696

        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing

        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2137 ----------------------------------------------------------- This is an important feature. Since the boolean parameter, forward, correlates so closely with reseek, can we give it a better name ? I was thinking about either reseek or forwardOnly. Ted On 2011-09-28 16:03:52, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- (Updated 2011-09-28 16:03:52) Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        Ted Yu added a comment -

        I got the following errors from test suite:

        Failed tests:   testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): expected:<1> but was:<0>
        
        Tests in error:
          testMergeTool(org.apache.hadoop.hbase.util.TestMergeTool): String index out of range: -1
          testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart): test timed out after 300000 milliseconds
        

        They passed individually.

        Show
        Ted Yu added a comment - I got the following errors from test suite: Failed tests: testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): expected:<1> but was:<0> Tests in error: testMergeTool(org.apache.hadoop.hbase.util.TestMergeTool): String index out of range: -1 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart): test timed out after 300000 milliseconds They passed individually.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/#review2161
        -----------------------------------------------------------

        src/main/java/org/apache/hadoop/hbase/KeyValue.java
        <https://reviews.apache.org/r/2084/#comment5035>

        I was implying that "this" is also a method argument when I wrote this comment. I will edit this to make it clearer.

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
        <https://reviews.apache.org/r/2084/#comment5036>

        Yes, I will modify the javadoc of this method.

        • Mikhail

        On 2011-09-28 16:03:52, Mikhail Bautin wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2084/

        -----------------------------------------------------------

        (Updated 2011-09-28 16:03:52)

        Review request for hbase.

        Summary

        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.

        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e

        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de

        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7

        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98

        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION

        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8

        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4

        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e

        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb

        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7

        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696

        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing

        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2161 ----------------------------------------------------------- src/main/java/org/apache/hadoop/hbase/KeyValue.java < https://reviews.apache.org/r/2084/#comment5035 > I was implying that "this" is also a method argument when I wrote this comment. I will edit this to make it clearer. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java < https://reviews.apache.org/r/2084/#comment5036 > Yes, I will modify the javadoc of this method. Mikhail On 2011-09-28 16:03:52, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- (Updated 2011-09-28 16:03:52) Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2011-09-28 17:42:46, Ted Yu wrote:

        > This is an important feature.

        >

        > Since the boolean parameter, forward, correlates so closely with reseek, can we give it a better name ?

        > I was thinking about either reseek or forwardOnly.

        We have a few diffs in the pipeline that depend on this one. Can we rename the boolean flag after we commit those diffs?

        • Mikhail

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/#review2137
        -----------------------------------------------------------

        On 2011-09-28 16:03:52, Mikhail Bautin wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2084/

        -----------------------------------------------------------

        (Updated 2011-09-28 16:03:52)

        Review request for hbase.

        Summary

        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.

        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e

        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de

        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7

        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98

        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION

        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8

        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4

        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e

        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb

        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7

        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696

        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing

        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - On 2011-09-28 17:42:46, Ted Yu wrote: > This is an important feature. > > Since the boolean parameter, forward, correlates so closely with reseek, can we give it a better name ? > I was thinking about either reseek or forwardOnly. We have a few diffs in the pipeline that depend on this one. Can we rename the boolean flag after we commit those diffs? Mikhail ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2137 ----------------------------------------------------------- On 2011-09-28 16:03:52, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- (Updated 2011-09-28 16:03:52) Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        jiraposter@reviews.apache.org added a comment -

        On 2011-09-28 17:42:46, Ted Yu wrote:

        > This is an important feature.

        >

        > Since the boolean parameter, forward, correlates so closely with reseek, can we give it a better name ?

        > I was thinking about either reseek or forwardOnly.

        Mikhail Bautin wrote:

        We have a few diffs in the pipeline that depend on this one. Can we rename the boolean flag after we commit those diffs?

        I am fine with the current name of forward.

        • Ted

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/#review2137
        -----------------------------------------------------------

        On 2011-09-28 16:03:52, Mikhail Bautin wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2084/

        -----------------------------------------------------------

        (Updated 2011-09-28 16:03:52)

        Review request for hbase.

        Summary

        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.

        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e

        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de

        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7

        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98

        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION

        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8

        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4

        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e

        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb

        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7

        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696

        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing

        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - On 2011-09-28 17:42:46, Ted Yu wrote: > This is an important feature. > > Since the boolean parameter, forward, correlates so closely with reseek, can we give it a better name ? > I was thinking about either reseek or forwardOnly. Mikhail Bautin wrote: We have a few diffs in the pipeline that depend on this one. Can we rename the boolean flag after we commit those diffs? I am fine with the current name of forward. Ted ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2137 ----------------------------------------------------------- On 2011-09-28 16:03:52, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- (Updated 2011-09-28 16:03:52) Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs ----- src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/
        -----------------------------------------------------------

        (Updated 2011-09-29 21:05:20.334849)

        Review request for hbase.

        Changes
        -------

        Addressing Jonathan's comments.

        Summary
        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.
        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs (updated)


        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4
        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e
        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION
        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98
        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c
        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7
        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de
        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5
        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef
        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4
        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e
        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb
        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7
        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696
        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing
        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- (Updated 2011-09-29 21:05:20.334849) Review request for hbase. Changes ------- Addressing Jonathan's comments. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs (updated) src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        Ted Yu added a comment -

        TestServerCustomProtocol#testRowRange failed during test suite run but passed standalone.

        Show
        Ted Yu added a comment - TestServerCustomProtocol#testRowRange failed during test suite run but passed standalone.
        Hide
        Ted Yu added a comment -

        Integrated to 0.92 branch and TRUNK.

        Thanks for the patch Mikhail.

        Thanks for the review Jonathan.

        Show
        Ted Yu added a comment - Integrated to 0.92 branch and TRUNK. Thanks for the patch Mikhail. Thanks for the review Jonathan.
        Hide
        jiraposter@reviews.apache.org added a comment -

        -----------------------------------------------------------
        This is an automatically generated e-mail. To reply, visit:
        https://reviews.apache.org/r/2084/#review2226
        -----------------------------------------------------------

        Ship it!

        I'm +0 on commmitting this. I tried reviewing it but I don't know this code well. The added unit test is nicely intrusive and the asserts look right. What about Nicolas's performance concerns. How are they addressed by this patch? I'm running a build of the patch and if that passes I'm +1 on commit.

        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        <https://reviews.apache.org/r/2084/#comment5175>

        Interesting method name. We should use this pattern everywhere we have to do this.

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
        <https://reviews.apache.org/r/2084/#comment5176>

        Should we get rid of this javadoc if an override? (Let us know can do on commit)

        • Michael

        On 2011-09-29 21:05:20, Mikhail Bautin wrote:

        -----------------------------------------------------------

        This is an automatically generated e-mail. To reply, visit:

        https://reviews.apache.org/r/2084/

        -----------------------------------------------------------

        (Updated 2011-09-29 21:05:20)

        Review request for hbase.

        Summary

        -------

        Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries.

        This addresses bug HBASE-2794.

        https://issues.apache.org/jira/browse/HBASE-2794

        Diffs

        -----

        src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8

        src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4

        src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e

        src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION

        src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c

        src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7

        src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de

        src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4

        src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e

        src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb

        src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7

        src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696

        src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION

        Diff: https://reviews.apache.org/r/2084/diff

        Testing

        -------

        Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest.

        Thanks,

        Mikhail

        Show
        jiraposter@reviews.apache.org added a comment - ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/#review2226 ----------------------------------------------------------- Ship it! I'm +0 on commmitting this. I tried reviewing it but I don't know this code well. The added unit test is nicely intrusive and the asserts look right. What about Nicolas's performance concerns. How are they addressed by this patch? I'm running a build of the patch and if that passes I'm +1 on commit. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java < https://reviews.apache.org/r/2084/#comment5175 > Interesting method name. We should use this pattern everywhere we have to do this. src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java < https://reviews.apache.org/r/2084/#comment5176 > Should we get rid of this javadoc if an override? (Let us know can do on commit) Michael On 2011-09-29 21:05:20, Mikhail Bautin wrote: ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/2084/ ----------------------------------------------------------- (Updated 2011-09-29 21:05:20) Review request for hbase. Summary ------- Previously we only used row-column Bloom filters for scans that only requested one column. We have seen production queries that request up to 200 columns, and with say ~6 store files per store (region / column family combination) this might have resulted in 1200 block read operations in the worst case. With this diff we will be avoiding seeks on store files that we know don't contain the row/column of interest when using an ExplicitColumnTracker. The performance should remain the same for column range queries. This addresses bug HBASE-2794 . https://issues.apache.org/jira/browse/HBASE-2794 Diffs ----- src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java f5173c4 src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java PRE-CREATION src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 6cdada7 src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 68cdac5 src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 08d3ba4 src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 32f88fb src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java a5d13f7 src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java baee696 src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java PRE-CREATION Diff: https://reviews.apache.org/r/2084/diff Testing ------- Existing unit tests. A new unit test (TestScanWithBloomError). Load testing using HBaseTest. Thanks, Mikhail
        Hide
        stack added a comment -

        These failed after running full suite but seem unrelated:

        Failed tests:   testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin)
          testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1>
        
        Tests in error:
          testEnableDisableAddColumnDeleteColumn(org.apache.hadoop.hbase.client.TestAdmin): org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin
        
        Show
        stack added a comment - These failed after running full suite but seem unrelated: Failed tests: testOnlineChangeTableSchema(org.apache.hadoop.hbase.client.TestAdmin) testForceSplitMultiFamily(org.apache.hadoop.hbase.client.TestAdmin): expected:<2> but was:<1> Tests in error: testEnableDisableAddColumnDeleteColumn(org.apache.hadoop.hbase.client.TestAdmin): org.apache.hadoop.hbase.TableNotEnabledException: testMasterAdmin
        Hide
        Mikhail Bautin added a comment -

        @Michael: I am observing a different set of spuriously failing tests, also seemingly unrelated.

        2011-09-29_20_41_15 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6027.3
        2011-09-29_23_09_51 | tests: 1012, fail: 0, err: 0, skip: 21, time: 5328.0
        2011-09-30_01_44_42 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6338.4
        2011-09-30_04_28_29 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6079.2
        2011-09-30_07_00_24 | tests: 1015, fail: 1, err: 0, skip: 21, time: 6656.2, failed: Admin
        2011-09-30_09_41_53 | tests: 1015, fail: 0, err: 0, skip: 21, time: 5900.8
        2011-09-30_12_10_25 | tests: 1004, fail: 1, err: 0, skip: 21, time: 5397.7, failed: DistributedLogSplitting

        (Patch applied on top of http://svn.apache.org/repos/asf/hbase/trunk@1176613)

        Show
        Mikhail Bautin added a comment - @Michael: I am observing a different set of spuriously failing tests, also seemingly unrelated. 2011-09-29_20_41_15 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6027.3 2011-09-29_23_09_51 | tests: 1012, fail: 0, err: 0, skip: 21, time: 5328.0 2011-09-30_01_44_42 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6338.4 2011-09-30_04_28_29 | tests: 1015, fail: 0, err: 0, skip: 21, time: 6079.2 2011-09-30_07_00_24 | tests: 1015, fail: 1, err: 0, skip: 21, time: 6656.2, failed: Admin 2011-09-30_09_41_53 | tests: 1015, fail: 0, err: 0, skip: 21, time: 5900.8 2011-09-30_12_10_25 | tests: 1004, fail: 1, err: 0, skip: 21, time: 5397.7, failed: DistributedLogSplitting (Patch applied on top of http://svn.apache.org/repos/asf/hbase/trunk@1176613 )
        Hide
        Hudson added a comment -

        Integrated in HBase-0.92 #34 (See https://builds.apache.org/job/HBase-0.92/34/)
        HBASE-2794 Utilize ROWCOL bloom filter if multiple columns within same family
        are requested in a Get (Mikhail Bautin)

        tedyu :
        Files :

        • /hbase/branches/0.92/CHANGES.txt
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/KeyValue.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        • /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java
        • /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
        Show
        Hudson added a comment - Integrated in HBase-0.92 #34 (See https://builds.apache.org/job/HBase-0.92/34/ ) HBASE-2794 Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get (Mikhail Bautin) tedyu : Files : /hbase/branches/0.92/CHANGES.txt /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2274 (See https://builds.apache.org/job/HBase-TRUNK/2274/)
        HBASE-2794 Utilize ROWCOL bloom filter if multiple columns within same family
        are requested in a Get (Mikhail Bautin)

        tedyu :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java
        • /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2274 (See https://builds.apache.org/job/HBase-TRUNK/2274/ ) HBASE-2794 Utilize ROWCOL bloom filter if multiple columns within same family are requested in a Get (Mikhail Bautin) tedyu : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java

          People

          • Assignee:
            Mikhail Bautin
            Reporter:
            Kannan Muthukkaruppan
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development