Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5268

Add delete column prefix delete marker

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Client, regionserver
    • Labels:
      None

      Description

      This is another part missing in the "wide row challenge".
      Currently entire families of a row can be deleted or individual columns or versions.
      There is no facility to mark multiple columns for deletion by column prefix.

      Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

      1. 5268.txt
        12 kB
        Lars Hofhansl
      2. 5268-proof.txt
        19 kB
        Lars Hofhansl
      3. 5268-v2.txt
        14 kB
        Lars Hofhansl
      4. 5268-v3.txt
        14 kB
        Lars Hofhansl
      5. 5268-v4.txt
        15 kB
        Lars Hofhansl
      6. 5268-v5.txt
        17 kB
        Lars Hofhansl

        Activity

        Hide
        stack stack added a comment -

        Give an example of what you mean please Lars (I think I know what you are on about....)

        Show
        stack stack added a comment - Give an example of what you mean please Lars (I think I know what you are on about....)
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        @Stack... Here's an example.
        Say you have columns X100, X101, X102, ..., X199, and X200, X201, X202, ..., X299, X300, ..., in a row R, all in the same column family.
        (I'd have this if I modeled rows inside HBase rows by using a prefix to identify the inner rows.)

        Now I want to be able to delete all columns that start with X1. Without a prefix delete marker I'd have to place 100 column delete markers. With a prefix marker I can just place one marker for the X1 prefix.
        If the prefix is empty, this would degenerate into the same behavior as a family marker (albeit less efficient).

        Show
        lhofhansl Lars Hofhansl added a comment - - edited @Stack... Here's an example. Say you have columns X100, X101, X102, ..., X199, and X200, X201, X202, ..., X299, X300, ..., in a row R, all in the same column family. (I'd have this if I modeled rows inside HBase rows by using a prefix to identify the inner rows.) Now I want to be able to delete all columns that start with X1. Without a prefix delete marker I'd have to place 100 column delete markers. With a prefix marker I can just place one marker for the X1 prefix. If the prefix is empty, this would degenerate into the same behavior as a family marker (albeit less efficient).
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        Here's a patch. The bulk is testing.

        During testing with delete marker types I found one strange scenario:
        Say you

        1. put columns 123, 1234, 12345
        2. then delete with prefix 123
        3. then put column 123 again
        4. now delete 123 with a normal column marker

        Now what happens is that the ScanDeleteTracker sees the normal column delete marker first, then it will see the new put for column 123. Now it will conclude that it is done with all versions of column 123, and thus seeks ahead to the next column. During that process the prefix marker with prefix 123 is also skipped. And hence 1234 and 12345 are no longer marked as deleted.

        This only happens in exactly this scenario.

        I cannot fix this without de-optimizing column delete markers or adding complicated logic to sort prefix delete marker always before all prefixes they affect regardless of the timestamp.

        I added this scenario as a unit test.

        Show
        lhofhansl Lars Hofhansl added a comment - - edited Here's a patch. The bulk is testing. During testing with delete marker types I found one strange scenario: Say you put columns 123, 1234, 12345 then delete with prefix 123 then put column 123 again now delete 123 with a normal column marker Now what happens is that the ScanDeleteTracker sees the normal column delete marker first, then it will see the new put for column 123. Now it will conclude that it is done with all versions of column 123, and thus seeks ahead to the next column. During that process the prefix marker with prefix 123 is also skipped. And hence 1234 and 12345 are no longer marked as deleted. This only happens in exactly this scenario. I cannot fix this without de-optimizing column delete markers or adding complicated logic to sort prefix delete marker always before all prefixes they affect regardless of the timestamp. I added this scenario as a unit test.
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        When I muck with KeyValue.KeyComparator to always sort prefix delete marker before all columns with the same prefix regardless of TS, then I see the opposite scenario that in the setup above the column delete marker is ignored and hence the newly put version of column 123 is visible again. This is the because the column marker is considered less specific and hence won't override the column prefix marker, even though it is newer. Arghh

        So ScanDeleteTracker would need to be completely refactored in order to keep prefix markers separate from column and version markers.
        The question is: Is that worth the risk, added code entropy, and effort, or can we document that prefix delete markers should not be used together with column delete markers (where the column markers column is identical to the prefix marker's column prefix, which does not make much sense anyway)?

        Show
        lhofhansl Lars Hofhansl added a comment - - edited When I muck with KeyValue.KeyComparator to always sort prefix delete marker before all columns with the same prefix regardless of TS, then I see the opposite scenario that in the setup above the column delete marker is ignored and hence the newly put version of column 123 is visible again. This is the because the column marker is considered less specific and hence won't override the column prefix marker, even though it is newer. Arghh So ScanDeleteTracker would need to be completely refactored in order to keep prefix markers separate from column and version markers. The question is: Is that worth the risk, added code entropy, and effort, or can we document that prefix delete markers should not be used together with column delete markers (where the column markers column is identical to the prefix marker's column prefix, which does not make much sense anyway)?
        Hide
        lhofhansl Lars Hofhansl added a comment - - edited

        I'll work on a fully correct patch today, and then we can decide whether it is worth the extra complexity. I find this first patch compelling, because (1) is very small and easy to verify and (2) it keeps column prefix delete markers sorted (by timestamp) with the KVs they affect.
        In the bigger patch, neither will be true anymore.

        Show
        lhofhansl Lars Hofhansl added a comment - - edited I'll work on a fully correct patch today, and then we can decide whether it is worth the extra complexity. I find this first patch compelling, because (1) is very small and easy to verify and (2) it keeps column prefix delete markers sorted (by timestamp) with the KVs they affect. In the bigger patch, neither will be true anymore.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        This version has no anomalies. And it actually isn't so bad. The only part I do not like is the change to the sorting in KeyValye.KeyComparator.

        Please let me know what you think.

        Show
        lhofhansl Lars Hofhansl added a comment - This version has no anomalies. And it actually isn't so bad. The only part I do not like is the change to the sorting in KeyValye.KeyComparator. Please let me know what you think.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Sorting prefix markers first has another problem.
        Placing a prefix marker for 123 at T and then another for 1234 at T+1 will ignore the 2nd one even though it is for newer timestamp. To fix that ScanDeleteTracker would need to keep track of delete marker that are prefixes of each other and would increase memory requirements.

        So I am back to the first approach. It is simpler, does not much with the sort order, has less memory requirements, and is easier to grok.
        Delete.deleteColumnsByPrefix now advises against mixing this with deleteColumn for the same qualifiers.

        I added more test and propose this for commit.

        Show
        lhofhansl Lars Hofhansl added a comment - Sorting prefix markers first has another problem. Placing a prefix marker for 123 at T and then another for 1234 at T+1 will ignore the 2nd one even though it is for newer timestamp. To fix that ScanDeleteTracker would need to keep track of delete marker that are prefixes of each other and would increase memory requirements. So I am back to the first approach. It is simpler, does not much with the sort order, has less memory requirements, and is easier to grok. Delete.deleteColumnsByPrefix now advises against mixing this with deleteColumn for the same qualifiers. I added more test and propose this for commit.
        Hide
        zhihyu@ebaysf.com Ted Yu added a comment -

        Delete.deleteColumnsByPrefix now advises against mixing this with deleteColumn for the same qualifiers.

        Shall we poll user@ and dev@ for the above constraint ?

        Show
        zhihyu@ebaysf.com Ted Yu added a comment - Delete.deleteColumnsByPrefix now advises against mixing this with deleteColumn for the same qualifiers. Shall we poll user@ and dev@ for the above constraint ?
        Hide
        tlipcon Todd Lipcon added a comment -

        This seems a little complicated to me... if this is meant for the purpose of building higher-level transactional capabilities, maybe we should start a branch for that exploration? I'm wary that this introduces a lot of complexity around optimizing the read path, for the sake of optimizing a delete path (which in my experience is rarely used)

        Show
        tlipcon Todd Lipcon added a comment - This seems a little complicated to me... if this is meant for the purpose of building higher-level transactional capabilities, maybe we should start a branch for that exploration? I'm wary that this introduces a lot of complexity around optimizing the read path, for the sake of optimizing a delete path (which in my experience is rarely used)
        Hide
        lhofhansl Lars Hofhansl added a comment -

        @Todd: Are you talking about v2 or v3? I agree v2 is too complicated.
        v3 just adds this to the read path (in ScanDeleteTracker):

        +        // check whether the delete marker is a prefix
        +        if (deleteType == KeyValue.Type.DeleteColumnPrefix.getCode()) {
        +          if (Bytes.compareTo(deleteBuffer, deleteOffset, deleteLength, buffer,
        +              qualifierOffset, deleteLength) == 0) {
        +            if (timestamp <= deleteTimestamp) {
        +              return DeleteResult.COLUMN_DELETED;
        +            }
        +          } else {
        +            // past the prefix marker
        +            deleteBuffer = null;
        +          }
        

        The rest is indentation (which makes the change to ScanDeleteTracker look big), tests, and two more methods on Delete.java.
        If prefix markers are not used there is no performance impact.

        @Ted: Sure. Although if we cannot live with this constraint, this becomes too complicated to be useful.

        Show
        lhofhansl Lars Hofhansl added a comment - @Todd: Are you talking about v2 or v3? I agree v2 is too complicated. v3 just adds this to the read path (in ScanDeleteTracker): + // check whether the delete marker is a prefix + if (deleteType == KeyValue.Type.DeleteColumnPrefix.getCode()) { + if (Bytes.compareTo(deleteBuffer, deleteOffset, deleteLength, buffer, + qualifierOffset, deleteLength) == 0) { + if (timestamp <= deleteTimestamp) { + return DeleteResult.COLUMN_DELETED; + } + } else { + // past the prefix marker + deleteBuffer = null ; + } The rest is indentation (which makes the change to ScanDeleteTracker look big), tests, and two more methods on Delete.java. If prefix markers are not used there is no performance impact. @Ted: Sure. Although if we cannot live with this constraint, this becomes too complicated to be useful.
        Hide
        zhihyu@ebaysf.com Ted Yu added a comment -

        I think the idea in patch v3 is useful in handling wide rows.

        I suggest making the selection of column prefix generic (in place of Bytes.compareTo call).
        e.g. user may want to use a regex to select the columns to be deleted.
        How hard is it to accommodate the above case ?

        Show
        zhihyu@ebaysf.com Ted Yu added a comment - I think the idea in patch v3 is useful in handling wide rows. I suggest making the selection of column prefix generic (in place of Bytes.compareTo call). e.g. user may want to use a regex to select the columns to be deleted. How hard is it to accommodate the above case ?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        I think that would difficult, because we could not sort the delete markers correctly. Delete by prefix works, because I can sort those at the right place w.r.t. the cells that they affect (delete by suffix, for example, would not work).

        I think what you have in mind would be the work of some kind of "delete filter" that needs to scan all the columns.

        The goal I want to reach is that we have the same flexibility for columns that we have for row-keys, so that key schema in HBase is not limited to the row-key.

        Show
        lhofhansl Lars Hofhansl added a comment - I think that would difficult, because we could not sort the delete markers correctly. Delete by prefix works, because I can sort those at the right place w.r.t. the cells that they affect (delete by suffix, for example, would not work). I think what you have in mind would be the work of some kind of "delete filter" that needs to scan all the columns. The goal I want to reach is that we have the same flexibility for columns that we have for row-keys, so that key schema in HBase is not limited to the row-key.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        One more test. And guard for short column identifiers following longer prefix delete markers.

        Show
        lhofhansl Lars Hofhansl added a comment - One more test. And guard for short column identifiers following longer prefix delete markers.
        Hide
        hadoopqa Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12511736/5268-v4.txt
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated -145 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 84 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests:
        org.apache.hadoop.hbase.replication.TestReplication

        Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/849//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/849//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/849//console

        This message is automatically generated.

        Show
        hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12511736/5268-v4.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -145 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 84 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.replication.TestReplication Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/849//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/849//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/849//console This message is automatically generated.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Test run looks good.
        @Todd are you -1 on this? (I'd understand if you were).

        Show
        lhofhansl Lars Hofhansl added a comment - Test run looks good. @Todd are you -1 on this? (I'd understand if you were).
        Hide
        zhihyu@ebaysf.com Ted Yu added a comment -

        I don't seem to get a poll from dev@hbase mailing list about this feature.

        Show
        zhihyu@ebaysf.com Ted Yu added a comment - I don't seem to get a poll from dev@hbase mailing list about this feature.
        Hide
        tlipcon Todd Lipcon added a comment -

        I don't know this code path very well at all... can you clarify for me where a "delete prefix" marker is inserted? Is it at the top of the row, or before the columns starting with that prefix?

        eg I have the following row:
        cola_1
        cola_2
        colb_1
        colb_2
        colc_1

        and I insert a "deletePrefix(colb_)"
        Does it sort before cola_1 or does it sort before colb_1?

        Show
        tlipcon Todd Lipcon added a comment - I don't know this code path very well at all... can you clarify for me where a "delete prefix" marker is inserted? Is it at the top of the row, or before the columns starting with that prefix? eg I have the following row: cola_1 cola_2 colb_1 colb_2 colc_1 and I insert a "deletePrefix(colb_)" Does it sort before cola_1 or does it sort before colb_1?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        @Ted: I'll send a message to dev@ momentarily.

        @Todd: I've changed this code a lot (HBASE-4071 and HBASE-4536)...
        It is sorted along with the columns. Assuming all KV have the same timestamp it would sort like this:
        cola_1
        cola_2
        colb_ delete marker
        colb_1
        colb_2
        colc_1

        If there are multiple versions of a column the delete marker would sort according to the TS as well (the exact same ordering rules used for column delete markers).
        This provides for the minimum amount of time a scanner would need to hold on to the delete marker during a scan.
        (In v2 I tried to change the ordering w.r.t. TS', but that turned out to be a rather nasty change.)

        Show
        lhofhansl Lars Hofhansl added a comment - @Ted: I'll send a message to dev@ momentarily. @Todd: I've changed this code a lot ( HBASE-4071 and HBASE-4536 )... It is sorted along with the columns. Assuming all KV have the same timestamp it would sort like this: cola_1 cola_2 colb_ delete marker colb_1 colb_2 colc_1 If there are multiple versions of a column the delete marker would sort according to the TS as well (the exact same ordering rules used for column delete markers). This provides for the minimum amount of time a scanner would need to hold on to the delete marker during a scan. (In v2 I tried to change the ordering w.r.t. TS', but that turned out to be a rather nasty change.)
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Slight correction of wording. The above sort orders happens independently of the columns TS'.

        Show
        lhofhansl Lars Hofhansl added a comment - Slight correction of wording. The above sort orders happens independently of the columns TS'.
        Hide
        tlipcon Todd Lipcon added a comment -

        I can't see how that's correct. Let's say instead of just two columns with colb_* we had a few thousand, such that it spanned many blocks. Then we requested a get with only the column colb_2000. The optimizations that Mikhail described at the HUG last week would assumedly want to seek directly to that block, but with the delete marker somewhere "in the middle" we'd be forced to scan forward from the beginning of the whole row looking for potential deletions somewhere along the way. Right?

        I think the delete prefix markers would have to go at the very top of the row, alongside the "delete family" markers (since we already force a top-of-row seek to look for delete-family, assuming lack of a deletion bloom).

        Show
        tlipcon Todd Lipcon added a comment - I can't see how that's correct. Let's say instead of just two columns with colb_* we had a few thousand, such that it spanned many blocks. Then we requested a get with only the column colb_2000. The optimizations that Mikhail described at the HUG last week would assumedly want to seek directly to that block, but with the delete marker somewhere "in the middle" we'd be forced to scan forward from the beginning of the whole row looking for potential deletions somewhere along the way. Right? I think the delete prefix markers would have to go at the very top of the row, alongside the "delete family" markers (since we already force a top-of-row seek to look for delete-family, assuming lack of a deletion bloom).
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Hmm... Good point. Isn't that the same for a column delete with many versions that span many blocks, though? In that case the ScanDeleteTracker would need to hold on to all prefix markers.
        I.e. there might be a prefix marker for 123 at T and then one for 1234 at T+A and another one for 12345 at T+A+B. They would need to stored and checked every time based on the TS of the rows they affect.

        I'll try the Get part.

        Show
        lhofhansl Lars Hofhansl added a comment - Hmm... Good point. Isn't that the same for a column delete with many versions that span many blocks, though? In that case the ScanDeleteTracker would need to hold on to all prefix markers. I.e. there might be a prefix marker for 123 at T and then one for 1234 at T+A and another one for 12345 at T+A+B. They would need to stored and checked every time based on the TS of the rows they affect. I'll try the Get part.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Added a test that that insert 100000 kvs, places a prefix marker at the beginning, and then Gets kv #90000, and it is correctly marked as delete.
        Do I need to enable Mikhail's optimizations or are they on by default?

        Show
        lhofhansl Lars Hofhansl added a comment - Added a test that that insert 100000 kvs, places a prefix marker at the beginning, and then Gets kv #90000, and it is correctly marked as delete. Do I need to enable Mikhail's optimizations or are they on by default?
        Hide
        tlipcon Todd Lipcon added a comment -

        Good point. Isn't that the same for a column delete with many versions that span many blocks, though?

        Yes, but my guess is that we currently seek to the beginning of the column as well as to the specific timestamp, in the case that a timestamp is requested. Mikhail would know more (I'm just going based on his presentation last week). The distinction here is that seeking to the beginning of a column is "well defined" whereas seeking to the beginning of the prefix is practically impossible (I suppose you could seek to O(length of column key) spots, but hardly reasonable)

        Show
        tlipcon Todd Lipcon added a comment - Good point. Isn't that the same for a column delete with many versions that span many blocks, though? Yes, but my guess is that we currently seek to the beginning of the column as well as to the specific timestamp, in the case that a timestamp is requested. Mikhail would know more (I'm just going based on his presentation last week). The distinction here is that seeking to the beginning of a column is "well defined" whereas seeking to the beginning of the prefix is practically impossible (I suppose you could seek to O(length of column key) spots, but hardly reasonable)
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Oh and in this case all columns are names 12300001, 12300002, ..., 12399999. The prefix marker set was 123. The column requested by get was 12390000.

        Show
        lhofhansl Lars Hofhansl added a comment - Oh and in this case all columns are names 12300001, 12300002, ..., 12399999. The prefix marker set was 123. The column requested by get was 12390000.
        Hide
        tlipcon Todd Lipcon added a comment -

        What if you'd also inserted 12200000 through 12299999? As is, your prefix marker is probably hitting the top of the row, instead of falling in the middle.

        Show
        tlipcon Todd Lipcon added a comment - What if you'd also inserted 12200000 through 12299999? As is, your prefix marker is probably hitting the top of the row, instead of falling in the middle.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Ah yes, now it's failing. And there I thought I'd know that code
        OK, so these need to be top of row. So not sure now, it makes the code more complicated than is warranted for this feature.

        Show
        lhofhansl Lars Hofhansl added a comment - Ah yes, now it's failing. And there I thought I'd know that code OK, so these need to be top of row. So not sure now, it makes the code more complicated than is warranted for this feature.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Leaving open for now.

        Show
        lhofhansl Lars Hofhansl added a comment - Leaving open for now.
        Hide
        mikhail Mikhail Bautin added a comment -

        @Lars, Todd: Yes, the column prefix delete will break lazy seek optimization. The lazy seek optimization is only enabled for multi-column queries, though, so the bug would not be seen when only reading one column or the whole row.

        More generally, our various types of deletes get in the way of "point queries", e.g. reading a particular column or a particular timestamp range, because we have to check for deletes first. As Todd pointed out, with column prefix deletes we would have to search the entire row for multi-column gets. We could trigger these optimizations using another "Column Prefix Delete" Bloom filter, but that might be an overkill. It might be easier to detect and delete just the required columns at the write path.

        Show
        mikhail Mikhail Bautin added a comment - @Lars, Todd: Yes, the column prefix delete will break lazy seek optimization. The lazy seek optimization is only enabled for multi-column queries, though, so the bug would not be seen when only reading one column or the whole row. More generally, our various types of deletes get in the way of "point queries", e.g. reading a particular column or a particular timestamp range, because we have to check for deletes first. As Todd pointed out, with column prefix deletes we would have to search the entire row for multi-column gets. We could trigger these optimizations using another "Column Prefix Delete" Bloom filter, but that might be an overkill. It might be easier to detect and delete just the required columns at the write path.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Patch that includes the get test, for reference.

        Show
        lhofhansl Lars Hofhansl added a comment - Patch that includes the get test, for reference.
        Hide
        zhihyu@ebaysf.com Ted Yu added a comment -

        Would a summary of the discussion here be eligible to be included in HBase book ?

        Show
        zhihyu@ebaysf.com Ted Yu added a comment - Would a summary of the discussion here be eligible to be included in HBase book ?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        "Column Prefix Delete" Bloom filter is overkill. I'll think about this a bit more, but probably mark as "won't fix".

        Show
        lhofhansl Lars Hofhansl added a comment - "Column Prefix Delete" Bloom filter is overkill. I'll think about this a bit more, but probably mark as "won't fix".
        Hide
        lhofhansl Lars Hofhansl added a comment -

        @Ted: Definitely. Deletes are documented here: http://hbase.apache.org/book.html#version.delete
        Maybe a small section about how the lazy seek interacts with them (although that would be a more a technical discussion).

        Show
        lhofhansl Lars Hofhansl added a comment - @Ted: Definitely. Deletes are documented here: http://hbase.apache.org/book.html#version.delete Maybe a small section about how the lazy seek interacts with them (although that would be a more a technical discussion).
        Hide
        zhihyu@ebaysf.com Ted Yu added a comment -

        I expect HBase users to have understanding of HBase internals.
        Plus, the very exposure of this detail may give rise to the next breakthrough (in performance optimization).

        Show
        zhihyu@ebaysf.com Ted Yu added a comment - I expect HBase users to have understanding of HBase internals. Plus, the very exposure of this detail may give rise to the next breakthrough (in performance optimization).
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Just did a quick test to sort these to the top of row. The Get test works fine then. Would need to fix up some other parts of the code.
        Could use a (prefix) Trie to store the column prefix delete markers in ScanDeleteTracker. That too seems overkill, though.

        Show
        lhofhansl Lars Hofhansl added a comment - Just did a quick test to sort these to the top of row. The Get test works fine then. Would need to fix up some other parts of the code. Could use a (prefix) Trie to store the column prefix delete markers in ScanDeleteTracker. That too seems overkill, though.
        Hide
        mcorgan Matt Corgan added a comment -

        This is probably way too complicated for a short term solution, but during flush/compaction could you repeat delete markers at the beginning of each block for as long as they are relevant?

        Show
        mcorgan Matt Corgan added a comment - This is probably way too complicated for a short term solution, but during flush/compaction could you repeat delete markers at the beginning of each block for as long as they are relevant?
        Hide
        lhofhansl Lars Hofhansl added a comment -

        I think the top-of-row is good enough. We have to seek there anyway because of potential family delete markers (although I think Mikhail plans to - or has already - optimize this with delete marker bloom filters).

        Show
        lhofhansl Lars Hofhansl added a comment - I think the top-of-row is good enough. We have to seek there anyway because of potential family delete markers (although I think Mikhail plans to - or has already - optimize this with delete marker bloom filters).
        Hide
        lhofhansl Lars Hofhansl added a comment -

        Just to prove to myself that it can be done... Here's a patch that sorts all column prefix delete markers top-of-row.
        ScsnDeleteTracker keeps a map with all delete prefixes and checks all of them, removing them when we scanned past the prefix.

        I have no intention to commit this. This is just a proof of concept. It is not efficient (should use a trie of bytes for the prefixes).

        Generally I am less exited about this now than I was earlier. A lot of work to optimize a path that is used less frequently than the Put path.

        Show
        lhofhansl Lars Hofhansl added a comment - Just to prove to myself that it can be done... Here's a patch that sorts all column prefix delete markers top-of-row. ScsnDeleteTracker keeps a map with all delete prefixes and checks all of them, removing them when we scanned past the prefix. I have no intention to commit this. This is just a proof of concept. It is not efficient (should use a trie of bytes for the prefixes). Generally I am less exited about this now than I was earlier. A lot of work to optimize a path that is used less frequently than the Put path.
        Hide
        lhofhansl Lars Hofhansl added a comment -

        I wrote a blog entry about this here: http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html
        Maybe we can take part of that into the HBase book.

        Show
        lhofhansl Lars Hofhansl added a comment - I wrote a blog entry about this here: http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html Maybe we can take part of that into the HBase book.
        Hide
        stack stack added a comment -

        I added a footnote to book in delete section pointing at your blog Lars.

        Show
        stack stack added a comment - I added a footnote to book in delete section pointing at your blog Lars.

          People

          • Assignee:
            Unassigned
            Reporter:
            lhofhansl Lars Hofhansl
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development