HBase
  1. HBase
  2. HBASE-7221

[experiment] RowKey utility class for rowkey construction

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: util
    • Labels:
      None

      Description

      A common question in the dist-lists is how to construct rowkeys, particularly composite keys. Put/Get/Scan specifies byte[] as the rowkey, but it's up to you to sensibly populate that byte-array, and that's where things tend to go off the rails.

      The intent of this RowKey utility class isn't meant to add functionality into Put/Get/Scan, but rather make it simpler for folks to construct said arrays. Example:

         RowKey key = RowKey.create(RowKey.SIZEOF_MD5_HASH + RowKey.SIZEOF_LONG);
         key.addHash(a);
         key.add(b);
         byte bytes[] = key.getBytes();
      
      1. hbase-server_hbase_7221_v6.patch
        34 kB
        Doug Meil
      2. hbase-server_hbase_7221_v5.patch
        31 kB
        Doug Meil
      3. hbase-common_hbase_7221_v4.patch
        31 kB
        Doug Meil
      4. hbase-common_hbase_7221_v3.patch
        15 kB
        Doug Meil
      5. hbase-common_hbase_7221_2.patch
        8 kB
        Doug Meil
      6. HBASE_7221.patch
        8 kB
        Doug Meil

        Issue Links

          Activity

          Hide
          Doug Meil added a comment -

          Marking as "won't fix" per conference call on the HBase client on 3-5-2013 with folks on this ticket. This ticket will be linked from a new Jira to be created on serialized types and rowkey utility construction and schemas.

          Show
          Doug Meil added a comment - Marking as "won't fix" per conference call on the HBase client on 3-5-2013 with folks on this ticket. This ticket will be linked from a new Jira to be created on serialized types and rowkey utility construction and schemas.
          Hide
          Doug Meil added a comment -

          Thanks Nick, I appreciate the detailed feedback. I concur that getting some more opinions on this as you suggested (Client component) would be beneficial.

          Show
          Doug Meil added a comment - Thanks Nick, I appreciate the detailed feedback. I concur that getting some more opinions on this as you suggested (Client component) would be beneficial.
          Hide
          Nick Dimiduk added a comment -

          I must say that the class names in HBASE-7692 are similar to those in HBASE-7221, which isn’t entirely surprising because HBASE-7221 is the ticket that started this whole rowkey construction conversation in the first place.

          Indeed they are similar, but on why I cannot remark. In 7692 simply used what Orderly provides. Personally, I think RowKey should not be a part of the name in either API; these are not just for rowkeys but for anywhere the existing Client API expects a byte[].

          Personally, I wish the user didn't have to think about any of this. HBase should ship with it's own type-system, have convenient constructors for those types based on Java types, and be done with it. Forcing the user to think about this as being used for a rowkey vs a column qualifier vs a value is unnecessary cognitive overhead that we should avoid. When I want to use a constant like 5 in a SQL expression, I just type 5 and the system handles coercing it into the appropriate system type; that's it. Likewise, I should be able write Get g = new Get(5).addColumn("c1", 12);. But I digress.

          But I think that 7692 is mixing terms and is harder to use and understand. 7221 refers to a RowKey as the “whole key” (e.g., 7221’s FixedLengthRowKey) which is consistent with that usage in HBase, whereas a part of a RowKey in 7221 is called a RowKeyElement. To contrast 7692's classnames, is BigDecimalRowKey the whole thing? Or a part of the rowkey?

          Agreed. 7692 is designed to solve a different problem than this ticket. It just happens to also include the feature described here. Both patches use the term "RowKey" in their API, to their detriment. This confusion is why I don't like RowKey used in these APIs. In Orderly, any of the *RowKey types can be used to create byte[]}}s for use in any context. So yes, {{BigDecimalRowKey produces a byte[] so it can be used as a stand-alone rowkey or as part of a compound rowkey, as desired. The StructRowKey, roughly analogous to this ticket's FixedLengthRowKey, is just another way to produce a byte[].

          The Orderly library has the added benefit of providing both fixed-length and variable-length encodings for the applicable types. It also includes support for specifying serialization order, a necessary consideration when implementing an HBase schema, something this ticket's latest patch cannot provide because of its dependency on Bytes.

          The hashing, while it can be added to 7692, was designed in from the get-go with 7221 because that’s the way we recommend folks to build keys. Lars/Ian, as you pointed out earlier in this ticket there is a reason that you found the 7221 approach familiar even in the first approach - because it’s similar to what you did internally.

          7692 can easily add support for hashing. Further, that support can be mixed with variable-length components. Otherwise, what we have here is a stylistic approach – the builder pattern vs the format-string approach. This is a matter of taste, upon which the 'Client' component owners should comment.

          Personally, I think this the 7221 approach is easier to understand and use, and still has safety-nets built-in for length testing on setters.

          Again, this is a difference of opinion between you and I, consistent with my initial comments in this ticket about using the format-string style instead of builders. The "safety-net" is an implementation detail of any fixed-length implementation; this is provided by implementations attached to both tickets.

          Show
          Nick Dimiduk added a comment - I must say that the class names in HBASE-7692 are similar to those in HBASE-7221 , which isn’t entirely surprising because HBASE-7221 is the ticket that started this whole rowkey construction conversation in the first place. Indeed they are similar, but on why I cannot remark. In 7692 simply used what Orderly provides. Personally, I think RowKey should not be a part of the name in either API; these are not just for rowkeys but for anywhere the existing Client API expects a byte[] . Personally, I wish the user didn't have to think about any of this. HBase should ship with it's own type-system, have convenient constructors for those types based on Java types, and be done with it. Forcing the user to think about this as being used for a rowkey vs a column qualifier vs a value is unnecessary cognitive overhead that we should avoid. When I want to use a constant like 5 in a SQL expression, I just type 5 and the system handles coercing it into the appropriate system type; that's it. Likewise, I should be able write Get g = new Get(5).addColumn("c1", 12); . But I digress. But I think that 7692 is mixing terms and is harder to use and understand. 7221 refers to a RowKey as the “whole key” (e.g., 7221’s FixedLengthRowKey) which is consistent with that usage in HBase, whereas a part of a RowKey in 7221 is called a RowKeyElement. To contrast 7692's classnames, is BigDecimalRowKey the whole thing? Or a part of the rowkey? Agreed. 7692 is designed to solve a different problem than this ticket. It just happens to also include the feature described here. Both patches use the term "RowKey" in their API, to their detriment. This confusion is why I don't like RowKey used in these APIs. In Orderly, any of the *RowKey types can be used to create byte[]}}s for use in any context. So yes, {{BigDecimalRowKey produces a byte[] so it can be used as a stand-alone rowkey or as part of a compound rowkey, as desired. The StructRowKey , roughly analogous to this ticket's FixedLengthRowKey , is just another way to produce a byte[] . The Orderly library has the added benefit of providing both fixed-length and variable-length encodings for the applicable types. It also includes support for specifying serialization order, a necessary consideration when implementing an HBase schema, something this ticket's latest patch cannot provide because of its dependency on Bytes . The hashing, while it can be added to 7692, was designed in from the get-go with 7221 because that’s the way we recommend folks to build keys. Lars/Ian, as you pointed out earlier in this ticket there is a reason that you found the 7221 approach familiar even in the first approach - because it’s similar to what you did internally. 7692 can easily add support for hashing. Further, that support can be mixed with variable-length components. Otherwise, what we have here is a stylistic approach – the builder pattern vs the format-string approach. This is a matter of taste, upon which the 'Client' component owners should comment. Personally, I think this the 7221 approach is easier to understand and use, and still has safety-nets built-in for length testing on setters. Again, this is a difference of opinion between you and I, consistent with my initial comments in this ticket about using the format-string style instead of builders. The "safety-net" is an implementation detail of any fixed-length implementation; this is provided by implementations attached to both tickets.
          Hide
          Doug Meil added a comment -

          I must say that the class names in HBASE-7692 are similar to those in HBASE-7221, which isn’t entirely surprising because HBASE-7221 is the ticket that started this whole rowkey construction conversation in the first place.

          StructRowKey (7692) is basically a RowKeySchema (7221). And 7692 also has similar sounding-class names: BigDecimalRowKey, IntWritableRowKey, LongRowKey, LongWritableRowKey.

          But I think that 7692 is mixing terms and is harder to use and understand. 7221 refers to a RowKey as the “whole key” (e.g., 7221’s FixedLengthRowKey) which is consistent with that usage in HBase, whereas a part of a RowKey in 7221 is called a RowKeyElement. To constrast 7692's classnames, is BigDecimalRowKey the whole thing? Or a part of the rowkey?

          The hashing, while it can be added to 7692, was designed in from the get-go with 7221 because that’s the way we recommend folks to build keys. Lars/Ian, as you pointed out earlier in this ticket there is a reason that you found the 7221 approach familiar even in the first approach - because it’s similar to what you did internally.

          Personally, I think this the 7221 approach is easier to understand and use, and still has safety-nets built-in for length testing on setters.

          RowKeySchema schema = new RowKeySchema.Builder()
           .add(RowKeySchema.MD5_HASH)
           .add(RowKeySchema.INT)
           .add(RowKeySchema.LONG)
           .add(RowKeySchema.BYTE)
           .add(new RowKeyBytesElement(bytesVal.length)) 
           .build();
             
          FixedLengthRowKey rowkey = schema.createFixedLengthRowKey();
          rowkey.setInt(0, hashVal);  // this will hash the int because the schema definition says so.
          rowkey.setInt(1, intVal);
          rowkey.setLong(2, longVal);
          rowkey.setByte(3, byteVal);
          rowkey.setBytes(4, bytesVal);
          
          byte bytes[] = rowkey.getBytes();
          

          I would like to point out that I like that 7692 already has variable-length key support (e.g., if folks use URLs inside of keys). That would need a class called VariableLengthRowKey to support that with the RowKeySchema approach (another patch).

          Show
          Doug Meil added a comment - I must say that the class names in HBASE-7692 are similar to those in HBASE-7221 , which isn’t entirely surprising because HBASE-7221 is the ticket that started this whole rowkey construction conversation in the first place. StructRowKey (7692) is basically a RowKeySchema (7221). And 7692 also has similar sounding-class names: BigDecimalRowKey, IntWritableRowKey, LongRowKey, LongWritableRowKey. But I think that 7692 is mixing terms and is harder to use and understand. 7221 refers to a RowKey as the “whole key” (e.g., 7221’s FixedLengthRowKey) which is consistent with that usage in HBase, whereas a part of a RowKey in 7221 is called a RowKeyElement. To constrast 7692's classnames, is BigDecimalRowKey the whole thing? Or a part of the rowkey? The hashing, while it can be added to 7692, was designed in from the get-go with 7221 because that’s the way we recommend folks to build keys. Lars/Ian, as you pointed out earlier in this ticket there is a reason that you found the 7221 approach familiar even in the first approach - because it’s similar to what you did internally. Personally, I think this the 7221 approach is easier to understand and use, and still has safety-nets built-in for length testing on setters. RowKeySchema schema = new RowKeySchema.Builder() .add(RowKeySchema.MD5_HASH) .add(RowKeySchema.INT) .add(RowKeySchema.LONG) .add(RowKeySchema.BYTE) .add( new RowKeyBytesElement(bytesVal.length)) .build(); FixedLengthRowKey rowkey = schema.createFixedLengthRowKey(); rowkey.setInt(0, hashVal); // this will hash the int because the schema definition says so. rowkey.setInt(1, intVal); rowkey.setLong(2, longVal); rowkey.setByte(3, byteVal); rowkey.setBytes(4, bytesVal); byte bytes[] = rowkey.getBytes(); I would like to point out that I like that 7692 already has variable-length key support (e.g., if folks use URLs inside of keys). That would need a class called VariableLengthRowKey to support that with the RowKeySchema approach (another patch).
          Hide
          Doug Meil added a comment -

          I'm out of office today, I will respond on 2-28-2013

          Show
          Doug Meil added a comment - I'm out of office today, I will respond on 2-28-2013
          Hide
          Nick Dimiduk added a comment -

          I believe this ticket is redundant to, not compatible with, HBASE-7692. They tread similar ground but with different intention. Please correct me if I'm wrong, but this ticket seeks to add a convenience for building byte[] values from component pieces and make it easy to read the pieces back out again. It it only mentions ordering of the serialized representation by way of Lars Hofhansl's comment; it otherwise does not address the issue. HBASE-7692 directly targets the ordered serialization problem. It also provides an implementation for building byte[]s from component pieces, which is built on top of the ordered serialization implementation. The result is a composite byte[] that maintains sort order with respect to the components.

          By my read, RowKeySchema in this ticket looks roughly equivalent to StructRowKey added in 7692. Both encapsulate an ordered sequence of serializable types. FixedLengthRowKey in this ticket handles reading and writing values to a byte[] according to the RowKeySchema. In 7692, these operations are encapsulated in the serialize, deserialize methods on StructRowKey, which in turn delegate to the component *RowKey implementations. This ticket's RowKeyElement class appears to use a fixed-length data encoding implicitly because it does not rely on the value under question to produce an encoded length. In 7692, this concern is delegated to the *RowKey implementations. This ticket provides explicit support for hashing byte[] values using either of two provided algorithms. 7692 does no hashing for the user as it stands. This feature could be added if desired.

          The major difference is in this ticket's RowKeyDataConverter. It does the work serializing and deserializing values. It does so using the existing util.Bytes. This is where HBASE-7692 aims to provide an entirely different feature: a serialization format that preserves order consistent with the natural representation. It does so via the rest of the *RowKey implementations.

          Yes, a RowKeyDataConverter could be implemented that made use of the serializers in HBASE-7692. It would make decisions for the user regarding how to represent the data, including whether to use a fixed- or variable-width encoding format (features not provided by this ticket).

          There is one other key feature that is omitted from both implementations under discussion. Neither implementation goes the extra mile of serializing schema details into the representations they produce (see also: Avro). I think this is an extremely useful (necessary) feature for a long-term serialization format. Without this, any change to decisions we make here will require rewriting data stored in a previous format. I've not investigated how/if this can be done while maintaining the order-preserving nature of the serialization strategy. It may be that the two features are mutually exclusive by some necessity of one or the other.

          Show
          Nick Dimiduk added a comment - I believe this ticket is redundant to, not compatible with, HBASE-7692 . They tread similar ground but with different intention. Please correct me if I'm wrong, but this ticket seeks to add a convenience for building byte[] values from component pieces and make it easy to read the pieces back out again. It it only mentions ordering of the serialized representation by way of Lars Hofhansl 's comment ; it otherwise does not address the issue. HBASE-7692 directly targets the ordered serialization problem. It also provides an implementation for building byte[]s from component pieces, which is built on top of the ordered serialization implementation. The result is a composite byte[] that maintains sort order with respect to the components. By my read, RowKeySchema in this ticket looks roughly equivalent to StructRowKey added in 7692. Both encapsulate an ordered sequence of serializable types . FixedLengthRowKey in this ticket handles reading and writing values to a byte[] according to the RowKeySchema. In 7692, these operations are encapsulated in the serialize, deserialize methods on StructRowKey, which in turn delegate to the component *RowKey implementations. This ticket's RowKeyElement class appears to use a fixed-length data encoding implicitly because it does not rely on the value under question to produce an encoded length. In 7692, this concern is delegated to the *RowKey implementations. This ticket provides explicit support for hashing byte[] values using either of two provided algorithms. 7692 does no hashing for the user as it stands. This feature could be added if desired. The major difference is in this ticket's RowKeyDataConverter. It does the work serializing and deserializing values. It does so using the existing util.Bytes. This is where HBASE-7692 aims to provide an entirely different feature: a serialization format that preserves order consistent with the natural representation. It does so via the rest of the *RowKey implementations. Yes, a RowKeyDataConverter could be implemented that made use of the serializers in HBASE-7692 . It would make decisions for the user regarding how to represent the data, including whether to use a fixed- or variable-width encoding format (features not provided by this ticket). There is one other key feature that is omitted from both implementations under discussion. Neither implementation goes the extra mile of serializing schema details into the representations they produce (see also: Avro). I think this is an extremely useful (necessary) feature for a long-term serialization format. Without this, any change to decisions we make here will require rewriting data stored in a previous format. I've not investigated how/if this can be done while maintaining the order-preserving nature of the serialization strategy. It may be that the two features are mutually exclusive by some necessity of one or the other.
          Hide
          Lars Hofhansl added a comment -

          Doug Meil Sounds reasonably to me, as long as these two patches are indeed complementary and work can work together.
          stack, Ted Yu, any comments?

          Show
          Lars Hofhansl added a comment - Doug Meil Sounds reasonably to me, as long as these two patches are indeed complementary and work can work together. stack , Ted Yu , any comments?
          Hide
          Doug Meil added a comment -

          Lars, no problem on the discussion. It made the patch better.

          I think this and HBASE-7692 can be complementary because another RowKeyDataConverter implementation can be added using Orderly.

          Show
          Doug Meil added a comment - Lars, no problem on the discussion. It made the patch better. I think this and HBASE-7692 can be complementary because another RowKeyDataConverter implementation can be added using Orderly.
          Hide
          Nick Dimiduk added a comment -

          Orderly, via HBASE-7692, does provide a similar builder interface. However, as I mentioned in my example, it doesn't implement any hashing strategies for the user. It also provides for specifying the order of serialized results.

          Show
          Nick Dimiduk added a comment - Orderly, via HBASE-7692 , does provide a similar builder interface. However, as I mentioned in my example , it doesn't implement any hashing strategies for the user. It also provides for specifying the order of serialized results.
          Hide
          Lars Hofhansl added a comment -

          Fair enough.
          How does that relate to Nick's work on HBASE-7692? Do you see these as complimentary?

          We had way too much discussion for a feature like, and I apologize. Just seems important to get this right. Having some kind of standard key-building facility is essential for HBase (IMHO), and will lay the foundation for features layered on top of HBase (such as 2ndary indexes, etc).

          Show
          Lars Hofhansl added a comment - Fair enough. How does that relate to Nick's work on HBASE-7692 ? Do you see these as complimentary? We had way too much discussion for a feature like, and I apologize. Just seems important to get this right. Having some kind of standard key-building facility is essential for HBase (IMHO), and will lay the foundation for features layered on top of HBase (such as 2ndary indexes, etc).
          Hide
          Doug Meil added a comment -

          Lars? Is that reasonable? If somebody wants to use reverse-timestamp pattern (e.g., Long.MAX_VALUE - val) with this utility there is nothing stopping them. It just won't do it automatically (just like everybody else has to do now). I think this patch is ready, but it can still be extended in the future e.g.,

          
          

          builder.add(RowKeySchema.INT, descending)
          (code}
          ... but I'd rather not add half-implemented placeholders at this point.

          Show
          Doug Meil added a comment - Lars? Is that reasonable? If somebody wants to use reverse-timestamp pattern (e.g., Long.MAX_VALUE - val) with this utility there is nothing stopping them. It just won't do it automatically (just like everybody else has to do now). I think this patch is ready, but it can still be extended in the future e.g., builder.add(RowKeySchema.INT, descending) (code} ... but I'd rather not add half-implemented placeholders at this point.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12570515/hbase-server_hbase_7221_v6.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570515/hbase-server_hbase_7221_v6.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4497//console This message is automatically generated.
          Hide
          Doug Meil added a comment -

          I think configurable sort order is a good idea, but I think that is another ticket. For what this proposal was trying to achieve it addresses the concerns. Is that reasonable?

          Show
          Doug Meil added a comment - I think configurable sort order is a good idea, but I think that is another ticket. For what this proposal was trying to achieve it addresses the concerns. Is that reasonable?
          Hide
          Lars Hofhansl added a comment -

          What I was trying to say is that in the builder we should be able to indicate whether a key part is sorting ascending or descending, even though this is not used by the default converter. We can default it to ascending.

          I think without this it is not future proof. Does this make sense?

          Show
          Lars Hofhansl added a comment - What I was trying to say is that in the builder we should be able to indicate whether a key part is sorting ascending or descending, even though this is not used by the default converter. We can default it to ascending. I think without this it is not future proof. Does this make sense?
          Hide
          Doug Meil added a comment -

          v6 patch has been added with Javadoc changes, plus making RowKeyDataConverter and abstract class with RowKeyBytesDataConverter the default implementation (it's still a pluggable pattern as before though).

          Show
          Doug Meil added a comment - v6 patch has been added with Javadoc changes, plus making RowKeyDataConverter and abstract class with RowKeyBytesDataConverter the default implementation (it's still a pluggable pattern as before though).
          Hide
          Doug Meil added a comment -

          Made RowKeyDataConverter an abstract class

          Created RowKeyBytesDataConverter the default implementation (i.e., uses Bytes utility)

          Added Javadoc explanation of the implications of Bytes sort order in RowKeyBytesDataConverter.

          Added reference in RowKeySchema that RowKeyBytesDataConverter is the default strategy.

          Show
          Doug Meil added a comment - Made RowKeyDataConverter an abstract class Created RowKeyBytesDataConverter the default implementation (i.e., uses Bytes utility) Added Javadoc explanation of the implications of Bytes sort order in RowKeyBytesDataConverter. Added reference in RowKeySchema that RowKeyBytesDataConverter is the default strategy.
          Hide
          Doug Meil added a comment -

          So +1 then with an update on the Javadoc of the default converter? That's really all that's missing.

          Show
          Doug Meil added a comment - So +1 then with an update on the Javadoc of the default converter? That's really all that's missing.
          Hide
          Lars Hofhansl added a comment -

          Oh yeah, the pluggable converters are perfect.
          The only part I wanted to point out is that there should be an indication about asc/desc sort order, so that this is generally useful.

          Show
          Lars Hofhansl added a comment - Oh yeah, the pluggable converters are perfect. The only part I wanted to point out is that there should be an indication about asc/desc sort order, so that this is generally useful.
          Hide
          Doug Meil added a comment -

          Arguably, one thing that is missing is Javadoc in the default implementation describing the -1/0/1 ordering issue with Bytes. This might not affect most people, but it affects some. But if somebody wants to use Lily for data conversion, that's their call - and the design supports plugging in their own converter.

          Show
          Doug Meil added a comment - Arguably, one thing that is missing is Javadoc in the default implementation describing the -1/0/1 ordering issue with Bytes. This might not affect most people, but it affects some. But if somebody wants to use Lily for data conversion, that's their call - and the design supports plugging in their own converter.
          Hide
          Doug Meil added a comment -

          Lars, I believe the pluggable RowKeyDataConverter RowKeySchema addresses that. The default implementation uses Bytes, however this is a pluggable strategy and the user has full control over this. Bytes isn't burned into the FixedLengthRowKey implementation like before.

          Show
          Doug Meil added a comment - Lars, I believe the pluggable RowKeyDataConverter RowKeySchema addresses that. The default implementation uses Bytes, however this is a pluggable strategy and the user has full control over this. Bytes isn't burned into the FixedLengthRowKey implementation like before.
          Hide
          Lars Hofhansl added a comment -

          Sorry for being a parti-pooper, but I still think that this causing more harm than good if it does not include a sorting encoder - one that sorts UTF-8 string, ints, longs, floats, into a byte[] that sorts lexicographically according to the datatypes.
          I one can plug it in, but at least there would need to be an indication somewhere whether you want to sort ascending or descending.

          Show
          Lars Hofhansl added a comment - Sorry for being a parti-pooper, but I still think that this causing more harm than good if it does not include a sorting encoder - one that sorts UTF-8 string, ints, longs, floats, into a byte[] that sorts lexicographically according to the datatypes. I one can plug it in, but at least there would need to be an indication somewhere whether you want to sort ascending or descending.
          Hide
          Doug Meil added a comment -

          Hey folks, can I ask you to give the v5 patch a review? Thanks!

          Show
          Doug Meil added a comment - Hey folks, can I ask you to give the v5 patch a review? Thanks!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12569548/hbase-server_hbase_7221_v5.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          -1 lineLengths. The patch introduces lines longer than 100

          -1 core tests. The patch failed these unit tests:
          org.apache.hadoop.hbase.TestCheckTestClasses

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12569548/hbase-server_hbase_7221_v5.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. +1 javadoc . The javadoc tool did not generate any warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 findbugs . The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces lines longer than 100 -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.TestCheckTestClasses Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4443//console This message is automatically generated.
          Hide
          Doug Meil added a comment -

          Ok, sorry about the previous confusion. Summary of changes from the previous reviews:

          Changed package - org.apache.hbase.schema (formerly was in util). There are more classes and I thought adding all of it to util would be too much.

          Handles multiple hash-functions (now Murmur as well as MD5)

          Strategy pattern for data conversion (i.e., the -1/0/1 sorting issue with Bytes utility). If you want your own, you can subclass and plug in on RowKeySchema.

          Builder pattern on RowKeySchema.

          Changed RowKey to FixedLengthRowKey (this is still stateful because it has both write and read support).

          ALSO: moved from hbase-common to hbase-server. This was a necessity because MurmurHash is in hbase-server.

          Show
          Doug Meil added a comment - Ok, sorry about the previous confusion. Summary of changes from the previous reviews: Changed package - org.apache.hbase.schema (formerly was in util). There are more classes and I thought adding all of it to util would be too much. Handles multiple hash-functions (now Murmur as well as MD5) Strategy pattern for data conversion (i.e., the -1/0/1 sorting issue with Bytes utility). If you want your own, you can subclass and plug in on RowKeySchema. Builder pattern on RowKeySchema. Changed RowKey to FixedLengthRowKey (this is still stateful because it has both write and read support). ALSO: moved from hbase-common to hbase-server. This was a necessity because MurmurHash is in hbase-server.
          Hide
          Doug Meil added a comment -

          The actual issue is this patch is in "common" but MurmurHash still exists, but it's in "server". If this patch is going to use MurmurHash it needs to be in "server".

          Show
          Doug Meil added a comment - The actual issue is this patch is in "common" but MurmurHash still exists, but it's in "server". If this patch is going to use MurmurHash it needs to be in "server".
          Hide
          Doug Meil added a comment -

          See previous comment about MurmurHash.

          Show
          Doug Meil added a comment - See previous comment about MurmurHash.
          Hide
          Doug Meil added a comment -

          Hold it, there's one issue: MurmurHash in org.apache.hadoop.hbase.util disappeared. I need to fix.

          Show
          Doug Meil added a comment - Hold it, there's one issue: MurmurHash in org.apache.hadoop.hbase.util disappeared. I need to fix.
          Hide
          Doug Meil added a comment -

          Attaching v4 patch with requested refactoring and some other things.

          Changed package - org.apache.hbase.schema (formerly was in util). There are more classes and I thought adding all of it to util would be too much.

          Handles multiple hash-functions (now Murmur as well as MD5)

          Strategy pattern for data conversion (i.e., the -1/0/1 sorting issue with Bytes utility). If you want your own, you can subclass and plug in on RowKeySchema.

          Builder pattern on RowKeySchema.

          Changed RowKey to FixedLengthRowKey (this is still stateful because it has both write and read support).

          Show
          Doug Meil added a comment - Attaching v4 patch with requested refactoring and some other things. Changed package - org.apache.hbase.schema (formerly was in util). There are more classes and I thought adding all of it to util would be too much. Handles multiple hash-functions (now Murmur as well as MD5) Strategy pattern for data conversion (i.e., the -1/0/1 sorting issue with Bytes utility). If you want your own, you can subclass and plug in on RowKeySchema. Builder pattern on RowKeySchema. Changed RowKey to FixedLengthRowKey (this is still stateful because it has both write and read support).
          Hide
          Doug Meil added a comment -

          Thanks Nick! Agree on the -1 on the nits, I'll fix that. I appreciate the revocation of the -1 on the approach.

          Show
          Doug Meil added a comment - Thanks Nick! Agree on the -1 on the nits, I'll fix that. I appreciate the revocation of the -1 on the approach.
          Hide
          Nick Dimiduk added a comment -

          The obvious stateless parser would model existing Java's Regex APIs: "compile" your format string and then use a Parser to consume the byte[]. There may be a more clever approach but, as you say, no one has volunteered any ideas. To the point both you and Lars made, a stateful implementation is likely faster, but then I have to assume the presence of wisdom in the C-wielding database implementers of old who chose the stateless approach for such things.

          I believe as you do that this kind of functionality should be packaged with HBase. Until I have opportunity to produce an alternate patch for consideration, I'll revoke my -1 from the approach of your implementation. However, I maintain the -1 regarding the nits I pointed out.

          Show
          Nick Dimiduk added a comment - The obvious stateless parser would model existing Java's Regex APIs: "compile" your format string and then use a Parser to consume the byte[]. There may be a more clever approach but, as you say, no one has volunteered any ideas. To the point both you and Lars made, a stateful implementation is likely faster, but then I have to assume the presence of wisdom in the C-wielding database implementers of old who chose the stateless approach for such things. I believe as you do that this kind of functionality should be packaged with HBase. Until I have opportunity to produce an alternate patch for consideration, I'll revoke my -1 from the approach of your implementation. However, I maintain the -1 regarding the nits I pointed out.
          Hide
          Doug Meil added a comment -

          Hi folks. The issue with the stateless approach is that nobody has a proposal that will work with both writing and reading the key. And given the high intelligence of the folks already commenting on this ticket means that the probability of such an approach existing is slim, because if it was obvious I think somebody would have thought of it by now. I don't see how this can be anything but a stateful object.

          Show
          Doug Meil added a comment - Hi folks. The issue with the stateless approach is that nobody has a proposal that will work with both writing and reading the key. And given the high intelligence of the folks already commenting on this ticket means that the probability of such an approach existing is slim, because if it was obvious I think somebody would have thought of it by now. I don't see how this can be anything but a stateful object.
          Hide
          Doug Meil added a comment -

          This parsing approach...

           byte[] key = RowKey.format("%16x%4d%8d", hashVal, intVal, longVal);
          

          ... seems a lot less understandable to me than the proposal. It also doesn't address reading components back, which is why the RowKey (aka FixedLengthKey/ComponentKey) needs to have state. I don't think it's enough just to have a builder pattern, people need some way of reading and processing the key. It's not just about the writes.

          Show
          Doug Meil added a comment - This parsing approach... byte [] key = RowKey.format( "%16x%4d%8d" , hashVal, intVal, longVal); ... seems a lot less understandable to me than the proposal. It also doesn't address reading components back, which is why the RowKey (aka FixedLengthKey/ComponentKey) needs to have state. I don't think it's enough just to have a builder pattern, people need some way of reading and processing the key. It's not just about the writes.
          Hide
          Lars Hofhansl added a comment -

          A consideration here also will be performance. Parsing out the format string on each creation will be slow. I do like the stateless approach.

          Show
          Lars Hofhansl added a comment - A consideration here also will be performance. Parsing out the format string on each creation will be slow. I do like the stateless approach.
          Hide
          Nick Dimiduk added a comment -

          I very much agree HBase should provide a canonical tool for building things like compound byte-arrays. Overall, I don't like the idea of a stateful utility class, so -1.

          This would be much more elegant (and less bug-prone) via a static method and an approach similar to the format-string style. The major down-side being you'd need a mini-language for representing these formats. The example in your class header, under this approach, would something like byte[] key = RowKey.format("%16x%4d%8d", hashVal, intVal, longVal);. With a Formatter implemented, you could have an accessor static method, call it split, that behaved similarly, ie byte[][] splits = RowKey.split("%16x%4d%8d", key);. The key idea here being the format and split strings are of identical form. Then the consumer can decide how to re-interpret the splits. Or maybe you get fancy with out-parameters or something, but that doesn't seem idiomatic.

          general nit: your line-lengths are well past 100 characters in some cases.

           * A stateful utility class for creating rowkeys for HBase tables, particularly composite keys.  RowKey
           * creates fixed length keys without the need for delimeters in between key elements (i.e., parts of the 
           * rowkey), which is a best practice in HBase.  A RowKey instance is instantiated from an associated RowKeySchema, 
          

          nit: s/delimeters/delimiters

          nit: RowKeySchema has superfluous imports.

          nit: TestRowKey has superfluous imports.

              boolean passed = true;
              try {
                RowKey rowkey = schema.createRowKey();
                rowkey.setHash(1, intVal);  // trying to set 'int' on an element that is sized for a hash.
              } catch (Exception e) {
          	    // we are expecting a sizing exception because we are setting a hash onto an element
          	    // sized for an int.
          	    passed = false;
              }
              if (passed) {
                Assert.fail("Test did not fail!");
               }
          

          nit: forego this passed business and put your assert right inline.

          Show
          Nick Dimiduk added a comment - I very much agree HBase should provide a canonical tool for building things like compound byte-arrays. Overall, I don't like the idea of a stateful utility class, so -1. This would be much more elegant (and less bug-prone) via a static method and an approach similar to the format-string style. The major down-side being you'd need a mini-language for representing these formats. The example in your class header, under this approach, would something like byte[] key = RowKey.format("%16x%4d%8d", hashVal, intVal, longVal); . With a Formatter implemented, you could have an accessor static method, call it split , that behaved similarly, ie byte[][] splits = RowKey.split("%16x%4d%8d", key); . The key idea here being the format and split strings are of identical form. Then the consumer can decide how to re-interpret the splits. Or maybe you get fancy with out-parameters or something, but that doesn't seem idiomatic. general nit: your line-lengths are well past 100 characters in some cases. * A stateful utility class for creating rowkeys for HBase tables, particularly composite keys. RowKey * creates fixed length keys without the need for delimeters in between key elements (i.e., parts of the * rowkey), which is a best practice in HBase. A RowKey instance is instantiated from an associated RowKeySchema, nit: s/delimeters/delimiters nit: RowKeySchema has superfluous imports. nit: TestRowKey has superfluous imports. boolean passed = true ; try { RowKey rowkey = schema.createRowKey(); rowkey.setHash(1, intVal); // trying to set ' int ' on an element that is sized for a hash. } catch (Exception e) { // we are expecting a sizing exception because we are setting a hash onto an element // sized for an int . passed = false ; } if (passed) { Assert.fail( "Test did not fail!" ); } nit: forego this passed business and put your assert right inline.
          Hide
          Doug Meil added a comment -

          @Enis.

          re: "Good utility class."

          Thanks!

          re: "I think there is nothing specific to row keys, we might as well just call it CompositeKey. "

          Per one of the feedback items earlier I was going to rename RowKey to FixedLengthRowKey and resubmit. CompositeKey would work too. But in this version I hadn't designed support yet for variable length keys (e.g., strings) and I figured that would be handled in another class called VariableLengthRowKey.

          Thoughts on this?

          Show
          Doug Meil added a comment - @Enis. re: "Good utility class." Thanks! re: "I think there is nothing specific to row keys, we might as well just call it CompositeKey. " Per one of the feedback items earlier I was going to rename RowKey to FixedLengthRowKey and resubmit. CompositeKey would work too. But in this version I hadn't designed support yet for variable length keys (e.g., strings) and I figured that would be handled in another class called VariableLengthRowKey. Thoughts on this?
          Hide
          Lars Hofhansl added a comment -

          Good point. Whatever we do should possibly integrate into Hive (and hence into Impala eventually).

          Show
          Lars Hofhansl added a comment - Good point. Whatever we do should possibly integrate into Hive (and hence into Impala eventually).
          Hide
          Enis Soztutar added a comment -

          We also have build encoders for primitive types in our Phoenix (SQL on HBase) project - to be open sourced soon hopefully. I'm sure we could donate those before the release of Phoenix (James Taylor).

          That is great to hear.

          So to be clear, we are all agreeing that we need two facilities: one to encode primitive values into byte[]'s suitable for sorting, and another one to build composite keys. Right?

          Yes, but I see them as a single API we, as HBase project, expose (or recommend) to users as the type system. Actually, since I also would like Hive and Phoenix (and other SQL-over-hbase) libs to share the same SQL types-to-byte[] model as well, but that is another discussion I guess.

          Show
          Enis Soztutar added a comment - We also have build encoders for primitive types in our Phoenix (SQL on HBase) project - to be open sourced soon hopefully. I'm sure we could donate those before the release of Phoenix (James Taylor). That is great to hear. So to be clear, we are all agreeing that we need two facilities: one to encode primitive values into byte[]'s suitable for sorting, and another one to build composite keys. Right? Yes, but I see them as a single API we, as HBase project, expose (or recommend) to users as the type system. Actually, since I also would like Hive and Phoenix (and other SQL-over-hbase) libs to share the same SQL types-to-byte[] model as well, but that is another discussion I guess.
          Hide
          Lars Hofhansl added a comment -

          Sorry, have been quiet here. Orderly and Lily have code for this. We could either lift the code from there (license permitting) or include one of these libraries (again license permitting).

          We also have build encoders for primitive types in our Phoenix (SQL on HBase) project - to be open sourced soon hopefully. I'm sure we could donate those before the release of Phoenix (James Taylor).

          So to be clear, we are all agreeing that we need two facilities: one to encode primitive values into byte[]'s suitable for sorting, and another one to build composite keys. Right?

          That would clearly be a good step for HBase. (1) it would introduce a "standard" way to encode primitive type and (2) it would help with the recurring task of build composite keys (something each and every user of HBase has to build eventually).

          Show
          Lars Hofhansl added a comment - Sorry, have been quiet here. Orderly and Lily have code for this. We could either lift the code from there (license permitting) or include one of these libraries (again license permitting). We also have build encoders for primitive types in our Phoenix (SQL on HBase) project - to be open sourced soon hopefully. I'm sure we could donate those before the release of Phoenix ( James Taylor ). So to be clear, we are all agreeing that we need two facilities: one to encode primitive values into byte[]'s suitable for sorting, and another one to build composite keys. Right? That would clearly be a good step for HBase. (1) it would introduce a "standard" way to encode primitive type and (2) it would help with the recurring task of build composite keys (something each and every user of HBase has to build eventually).
          Hide
          Enis Soztutar added a comment -

          Good utility class. I think there is nothing specific to row keys, we might as well just call it CompositeKey. There might be uses cases where you compose your column names as well.

          Show
          Enis Soztutar added a comment - Good utility class. I think there is nothing specific to row keys, we might as well just call it CompositeKey. There might be uses cases where you compose your column names as well.
          Hide
          Nick Dimiduk added a comment -

          1) LarsH was generally ok with this approach as long as it used a proposed new method in Bytes that would retain sort order negative-to-positive (which Bytes currently doesn't support). This would be in a different Jira.

          I was just speaking with the original author of the Orderly library about implementing an OrderedBytes utility class for Java primitives, based on his implementation. Have you created this related ticket?

          Show
          Nick Dimiduk added a comment - 1) LarsH was generally ok with this approach as long as it used a proposed new method in Bytes that would retain sort order negative-to-positive (which Bytes currently doesn't support). This would be in a different Jira. I was just speaking with the original author of the Orderly library about implementing an OrderedBytes utility class for Java primitives, based on his implementation. Have you created this related ticket?
          Hide
          Doug Meil added a comment -

          Summary:

          1) LarsH was generally ok with this approach as long as it used a proposed new method in Bytes that would retain sort order negative-to-positive (which Bytes currently doesn't support). This would be in a different Jira.

          2) The class RowKey would be changed to something like FixedLengthRowKey. RowKeySchema is ok.

          Email chain for posterity...

          From: lars hofhansl <lhofhansl@yahoo.com>
          Reply-To: lars hofhansl <lhofhansl@yahoo.com>
          Date: Tuesday, December 11, 2012 2:53 PM
          To: Doug Meil <doug.meil@explorysmedical.com>, Andrew Johnson <andrew.johnson@explorys.com>, Ian Varley <ivarley@salesforce.com>, Elliott Clark <eclark84@gmail.com>, "stack@duboce.net" <stack@duboce.net>
          Subject: Re: hbase-7221?

          Bytes.toBytes can't change. It needs to be efficient. We need a new method.
          The RowKeyBuilder (or whatever we'll call) should use that. It won't useful (IMHO) until we have that method finished.

          – Lars

          From: Doug Meil <doug.meil@explorysmedical.com>
          To: lars hofhansl <lhofhansl@yahoo.com>; Andrew Johnson <andrew.johnson@explorys.com>; Ian Varley <ivarley@salesforce.com>; Elliott Clark <eclark84@gmail.com>; "stack@duboce.net" <stack@duboce.net>
          Sent: Tuesday, December 11, 2012 11:40 AM
          Subject: Re: hbase-7221?

          Hey Lars, is that the plan? Separate ticket for the new Bytes.toBytes method?

          Do you want to update the Jira for 7221 for the agreement or do you want me to?

          From: Doug Meil <doug.meil@explorysmedical.com>
          Date: Friday, December 7, 2012 4:13 PM
          To: lars hofhansl <lhofhansl@yahoo.com>, Andrew Johnson <andrew.johnson@explorys.com>, Ian Varley <ivarley@salesforce.com>, Elliott Clark <eclark84@gmail.com>, "stack@duboce.net" <stack@duboce.net>
          Subject: Re: hbase-7221?

          Works for me. And also some serious JavaDoc is needed around the existing Bytes.toBytes methods documenting the issue so people aren't surprised. These are items in a different Jira though, right?

          And then RowKey uses the new toBytes method.

          We should carry a summary of this conversation back into the Jira for 7221.

          Thanks, Lars!

          Show
          Doug Meil added a comment - Summary: 1) LarsH was generally ok with this approach as long as it used a proposed new method in Bytes that would retain sort order negative-to-positive (which Bytes currently doesn't support). This would be in a different Jira. 2) The class RowKey would be changed to something like FixedLengthRowKey. RowKeySchema is ok. Email chain for posterity... From: lars hofhansl <lhofhansl@yahoo.com> Reply-To: lars hofhansl <lhofhansl@yahoo.com> Date: Tuesday, December 11, 2012 2:53 PM To: Doug Meil <doug.meil@explorysmedical.com>, Andrew Johnson <andrew.johnson@explorys.com>, Ian Varley <ivarley@salesforce.com>, Elliott Clark <eclark84@gmail.com>, "stack@duboce.net" <stack@duboce.net> Subject: Re: hbase-7221? Bytes.toBytes can't change. It needs to be efficient. We need a new method. The RowKeyBuilder (or whatever we'll call) should use that. It won't useful (IMHO) until we have that method finished. – Lars From: Doug Meil <doug.meil@explorysmedical.com> To: lars hofhansl <lhofhansl@yahoo.com>; Andrew Johnson <andrew.johnson@explorys.com>; Ian Varley <ivarley@salesforce.com>; Elliott Clark <eclark84@gmail.com>; "stack@duboce.net" <stack@duboce.net> Sent: Tuesday, December 11, 2012 11:40 AM Subject: Re: hbase-7221? Hey Lars, is that the plan? Separate ticket for the new Bytes.toBytes method? Do you want to update the Jira for 7221 for the agreement or do you want me to? From: Doug Meil <doug.meil@explorysmedical.com> Date: Friday, December 7, 2012 4:13 PM To: lars hofhansl <lhofhansl@yahoo.com>, Andrew Johnson <andrew.johnson@explorys.com>, Ian Varley <ivarley@salesforce.com>, Elliott Clark <eclark84@gmail.com>, "stack@duboce.net" <stack@duboce.net> Subject: Re: hbase-7221? Works for me. And also some serious JavaDoc is needed around the existing Bytes.toBytes methods documenting the issue so people aren't surprised. These are items in a different Jira though, right? And then RowKey uses the new toBytes method. We should carry a summary of this conversation back into the Jira for 7221. Thanks, Lars!
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12555763/hbase-common_hbase_7221_v3.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 104 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 27 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12555763/hbase-common_hbase_7221_v3.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 104 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 27 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3510//console This message is automatically generated.
          Hide
          Doug Meil added a comment -

          So what now? Seems like there is general agreement that something like this would be a good idea. And we all agree that there are plenty of edge cases that this doesn't cover.

          One thing to mention re: alternate encodings... I think this pattern is extensible and RowKeySchema can be augmented for different encoding strategies, while still keeping the easy-to-use that exists in the RowKey class.

          As for class-names, Elliot isn't crazy about the name RowKey. FixedLengthRowKey?

          As for variable length keys (e.g., for people that still insist on using Strings in keys), that's not a pattern that this class supports. I think you'd have to use delimiters between fields in that case, but that's seems like it could be supported in a subsequent patch (e.g., VariableLengthRowKey) in a different ticket.

          Thanks everybody for the review effort!

          Show
          Doug Meil added a comment - So what now? Seems like there is general agreement that something like this would be a good idea. And we all agree that there are plenty of edge cases that this doesn't cover. One thing to mention re: alternate encodings... I think this pattern is extensible and RowKeySchema can be augmented for different encoding strategies, while still keeping the easy-to-use that exists in the RowKey class. As for class-names, Elliot isn't crazy about the name RowKey. FixedLengthRowKey? As for variable length keys (e.g., for people that still insist on using Strings in keys), that's not a pattern that this class supports. I think you'd have to use delimiters between fields in that case, but that's seems like it could be supported in a subsequent patch (e.g., VariableLengthRowKey) in a different ticket. Thanks everybody for the review effort!
          Hide
          Doug Meil added a comment -

          Thanks Ian!

          re: "why are we "blessing" int, long & MD5 hash?"

          For this I have to refer to my prevent comment on HBase making anything possible but not much "easy" in terms of rowkey construction. I think between those datatypes it represents the commonly used key-element datatypes. And it makes it easy (i.e., will do the encoding/decoding for you).

          But you can always use setBytes if you want to do something custom (and getBytes for that position too).

          re: "Anyway, what you have seems straightforward enough that adding it might point some people in the right direction, without getting too fancy."

          Yep, that was the intent. You can always drop down to doing it 100% yourself, but this handles a lot of cases.

          Show
          Doug Meil added a comment - Thanks Ian! re: "why are we "blessing" int, long & MD5 hash?" For this I have to refer to my prevent comment on HBase making anything possible but not much "easy" in terms of rowkey construction. I think between those datatypes it represents the commonly used key-element datatypes. And it makes it easy (i.e., will do the encoding/decoding for you). But you can always use setBytes if you want to do something custom (and getBytes for that position too). re: "Anyway, what you have seems straightforward enough that adding it might point some people in the right direction, without getting too fancy." Yep, that was the intent. You can always drop down to doing it 100% yourself, but this handles a lot of cases.
          Hide
          Ian Varley added a comment -

          Cool! Looks good, we do something similar internally (as I'm sure do most shops). Definitely lots of room for making this experience more natural for beginners, good on you Doug.

          My first thought, like Lars, was: why are we "blessing" int, long & MD5 hash? As opposed to setting bytes only, and having this class just help with the arranging part? Sure, you can always just use the "setBytes/getBytes" methods and ignore the other stuff, but I feel like adding specific types to the list is a slippery slope (but I have no data to back that feeling up.

          Questions it raises for me:

          • As Lars mentioned, do you want to use standard ints, or binary comparable like Lily does (where the sign's at the end)?
          • What about Date objects? They'll be really common in row keys, of course. But, they're also easy to change into a long.
          • What about ways to indicate that something should be reverse ordered (descending), via bit inversion?

          If you stripped this down to just getByte(s)/setByte(s), would it still be useful? Seems like that's got the lion's share of the pattern there. Maybe then a subclass that adds common encodings (doing it as a subclass maybe makes it more obvious that this is just one set of encodings, and anybody else can do likewise).

          Anyway, what you have seems straightforward enough that adding it might point some people in the right direction, without getting too fancy.

          Also, seems like this should go in the forthcoming hbase-client module, y? Elliott, what's the timeline for that?

          Show
          Ian Varley added a comment - Cool! Looks good, we do something similar internally (as I'm sure do most shops). Definitely lots of room for making this experience more natural for beginners, good on you Doug. My first thought, like Lars, was: why are we "blessing" int, long & MD5 hash? As opposed to setting bytes only, and having this class just help with the arranging part? Sure, you can always just use the "setBytes/getBytes" methods and ignore the other stuff, but I feel like adding specific types to the list is a slippery slope (but I have no data to back that feeling up. Questions it raises for me: As Lars mentioned, do you want to use standard ints, or binary comparable like Lily does (where the sign's at the end)? What about Date objects? They'll be really common in row keys, of course. But, they're also easy to change into a long. What about ways to indicate that something should be reverse ordered (descending), via bit inversion? If you stripped this down to just getByte(s)/setByte(s), would it still be useful? Seems like that's got the lion's share of the pattern there. Maybe then a subclass that adds common encodings (doing it as a subclass maybe makes it more obvious that this is just one set of encodings, and anybody else can do likewise). Anyway, what you have seems straightforward enough that adding it might point some people in the right direction, without getting too fancy. Also, seems like this should go in the forthcoming hbase-client module, y? Elliott, what's the timeline for that?
          Hide
          Doug Meil added a comment -

          Additionally, in terms of additional encodings RowKeySchema can be added with different options down the line.

          Show
          Doug Meil added a comment - Additionally, in terms of additional encodings RowKeySchema can be added with different options down the line.
          Hide
          Doug Meil added a comment -

          re: "Interface looks good (so does this idea, generally)."

          Thanks!

          re: "At the same time everybody using HBase is building something like this, so maybe HBase should ship with a reasonable set of default encodings."

          This is precisely why I'm pushing this. Key construction a common question on the dist-list, and HBase makes anything possible - but not much easy - in this critical area.

          re: "RowKey"

          Change this class to FixedLengthRowKey?

          There are a lot of situations that are handled by the v3 proposal, and I also need to stress that this can not just create byte-arrays for keys, but also read them back (that was an addition in the v3 patch). You can pass in a rowkey when processing a table and then pick out the parts if needed (not something that you can do with a builder).

          re: other encodings

          As for exotic coding, RowKey (referring to v3 name) does allow for a setBytes(position, byte[]) for you to do the encoding yourself if you want. I was picking the most common datatypes used in key-construction, but this does not represent an exhaustive list.

          There are plenty of advance cases that this doesn't handle, such as variable length keys. I think a builder is probably better for that pattern.

          Show
          Doug Meil added a comment - re: "Interface looks good (so does this idea, generally)." Thanks! re: "At the same time everybody using HBase is building something like this, so maybe HBase should ship with a reasonable set of default encodings." This is precisely why I'm pushing this. Key construction a common question on the dist-list, and HBase makes anything possible - but not much easy - in this critical area. re: "RowKey" Change this class to FixedLengthRowKey? There are a lot of situations that are handled by the v3 proposal, and I also need to stress that this can not just create byte-arrays for keys, but also read them back (that was an addition in the v3 patch). You can pass in a rowkey when processing a table and then pick out the parts if needed (not something that you can do with a builder). re: other encodings As for exotic coding, RowKey (referring to v3 name) does allow for a setBytes(position, byte[]) for you to do the encoding yourself if you want. I was picking the most common datatypes used in key-construction, but this does not represent an exhaustive list. There are plenty of advance cases that this doesn't handle, such as variable length keys. I think a builder is probably better for that pattern.
          Hide
          Lars Hofhansl added a comment -

          Interface looks good (so does this idea, generally).

          But we are now (kind of) prescribing a way in which HBase should do its encoding.
          It seems this is best handled by an external library (orderly or lily do this).

          Encoding numbers into correctly sorted byte[] is not entirely trivial, neither is separating variable length parts of the key, different users will have different needs. Are strings UTF8? what about special sorting for languages?
          What about floats? They need to be encoded to sort correctly even considering their exponents (again there're libraries out there doing this already).

          At the same time everybody using HBase is building something like this, so maybe HBase should ship with a reasonable set of default encodings.

          Show
          Lars Hofhansl added a comment - Interface looks good (so does this idea, generally). But we are now (kind of) prescribing a way in which HBase should do its encoding. It seems this is best handled by an external library (orderly or lily do this). Encoding numbers into correctly sorted byte[] is not entirely trivial, neither is separating variable length parts of the key, different users will have different needs. Are strings UTF8? what about special sorting for languages? What about floats? They need to be encoded to sort correctly even considering their exponents (again there're libraries out there doing this already). At the same time everybody using HBase is building something like this, so maybe HBase should ship with a reasonable set of default encodings.
          Hide
          Elliott Clark added a comment -

          I still am against naming anything RowKey. It just invites confusion. I don't think that there's any plan to support passing these objects as a row key. So in my opinion they shouldn't be named that way. When a user searches on google/bing/duck duck go for hbase row key they should get the documentation about how they should structure a row key not some class that's named row key, but isn't actually the type to be passed in as a row key.

          Why not a more general builder style ?

          //Could just use magic numbers to represent replace.  Not sure how I feel about that
          enum HashPosition {
          PREPEND,
          APPEND,
          REPLACE
          }
          enum Order {
          ASSENDING, //Smaller Numbers first
          DESCENGING //Larger numbers first
          }
          class CompositeRowKeyBuilder {
            public CompositeRowKeyBuilder();
            public CompositeRowKeyBuilder(int numFields); //throw exception if the number of fields is off when build is called.
            public CompositeRowKeyBuilder(int numFields, int expectedBytes); //Same as above but allows for hint to ByteBuffer.
            private ByteBuffer bb = new ByteBuffer();
            public CompositeRowKeyBuilder addHash(Hash hash, HashPosition position);
            public CompositeRowKeyBuilder add(String s);
            public CompositeRowKeyBuilder add(String s, int length); //We should be trying to encourage fixed keys if possible
            public CompositeRowKeyBuilder add(Int i);
            public CompositeRowKeyBuilder add(Int i, Order o);
            public CompositeRowKeyBuilder add(Long l);
            public CompositeRowKeyBuilder add(Long l, Order o);
            public CompositeRowKeyBuilder add(Double d); //Use something like Orderly's() formatting allowing the sorting of double and float
            public CompositeRowKeyBuilder add(Double d, Order o);
            public CompositeRowKeyBuilder add(Float f);
            public CompositeRowKeyBuilder add(Float f, Order o);
            public CompositeRowKeyBuilder add(byte[] bytes);
            public byte[] build(); //yes I know this breaks the builder pattern a little bit.  But I think it's worth it.
          }
          class ExamplUsage {
          public static void main(String[]args) {
          CompositeRowKeyBuilder builder = new Builder();
          //rk should = MURUMUR_HASH("TestString".getBytes + 100.toBytes) + "TestString".getBytes + 100.getBytes
          byte[] rk = builder.addHash(Hash.MURUMUR, PREPEND).add(100).add("TestString").build()
          
          //rkTwo = "MyOtherTestString".reverse().getBytes.
          byte[] rkTwo = builder.setHash(Hash.REVERSE, REPLACE).add("MyOtherTestString").build()
          }
          }
          

          Thoughts ?

          Show
          Elliott Clark added a comment - I still am against naming anything RowKey. It just invites confusion. I don't think that there's any plan to support passing these objects as a row key. So in my opinion they shouldn't be named that way. When a user searches on google/bing/duck duck go for hbase row key they should get the documentation about how they should structure a row key not some class that's named row key, but isn't actually the type to be passed in as a row key. Why not a more general builder style ? //Could just use magic numbers to represent replace. Not sure how I feel about that enum HashPosition { PREPEND, APPEND, REPLACE } enum Order { ASSENDING, //Smaller Numbers first DESCENGING //Larger numbers first } class CompositeRowKeyBuilder { public CompositeRowKeyBuilder(); public CompositeRowKeyBuilder( int numFields); // throw exception if the number of fields is off when build is called. public CompositeRowKeyBuilder( int numFields, int expectedBytes); //Same as above but allows for hint to ByteBuffer. private ByteBuffer bb = new ByteBuffer(); public CompositeRowKeyBuilder addHash(Hash hash, HashPosition position); public CompositeRowKeyBuilder add( String s); public CompositeRowKeyBuilder add( String s, int length); //We should be trying to encourage fixed keys if possible public CompositeRowKeyBuilder add(Int i); public CompositeRowKeyBuilder add(Int i, Order o); public CompositeRowKeyBuilder add( Long l); public CompositeRowKeyBuilder add( Long l, Order o); public CompositeRowKeyBuilder add( Double d); //Use something like Orderly's() formatting allowing the sorting of double and float public CompositeRowKeyBuilder add( Double d, Order o); public CompositeRowKeyBuilder add( Float f); public CompositeRowKeyBuilder add( Float f, Order o); public CompositeRowKeyBuilder add( byte [] bytes); public byte [] build(); //yes I know this breaks the builder pattern a little bit. But I think it's worth it. } class ExamplUsage { public static void main( String []args) { CompositeRowKeyBuilder builder = new Builder(); //rk should = MURUMUR_HASH( "TestString" .getBytes + 100.toBytes) + "TestString" .getBytes + 100.getBytes byte [] rk = builder.addHash(Hash.MURUMUR, PREPEND).add(100).add( "TestString" ).build() //rkTwo = "MyOtherTestString" .reverse().getBytes. byte [] rkTwo = builder.setHash(Hash.REVERSE, REPLACE).add( "MyOtherTestString" ).build() } } Thoughts ?
          Hide
          Lars Hofhansl added a comment -

          I think this will just paste over the interesting issues.
          A composite rowkeys needs to be designed carefully for correct sorting.
          -1 should sort before +1, etc. Just encoding/decoding with Bytes won't work.

          Show
          Lars Hofhansl added a comment - I think this will just paste over the interesting issues. A composite rowkeys needs to be designed carefully for correct sorting. -1 should sort before +1, etc. Just encoding/decoding with Bytes won't work.
          Hide
          Doug Meil added a comment -

          Check out the v3 version of the patch with the RowKeySchema and RowKey. Thanks! (it was uploaded earlier today, but I thought I'd state that explicitly)

          Show
          Doug Meil added a comment - Check out the v3 version of the patch with the RowKeySchema and RowKey. Thanks! (it was uploaded earlier today, but I thought I'd state that explicitly)
          Hide
          Doug Meil added a comment -

          Ok, I think this one's a winner.

          There is a RowKeySchema and a RowKey. This creates fixed-length keys without delimiters (generally considered to be a best practice), and enforces the defined lengths when the elements are set. It's also bi-directional, so that you can pass in a byte-array (i.e., rowkey) from a table and then read the key elements back.

          Creation example...

              int elements[] = {RowKeySchema.SIZEOF_MD5_HASH, RowKeySchema.SIZEOF_INT, RowKeySchema.SIZEOF_LONG};
              RowKeySchema schema = new RowKeySchema(elements);
              
              RowKey rowkey = schema.createRowKey();
              rowkey.setHash(0, hashVal);
              rowkey.setInt(1, intVal);
              rowkey.setLong(2, longVal);
          	  
              byte bytes[] = rowkey.getBytes();
          
          
          Show
          Doug Meil added a comment - Ok, I think this one's a winner. There is a RowKeySchema and a RowKey. This creates fixed-length keys without delimiters (generally considered to be a best practice), and enforces the defined lengths when the elements are set. It's also bi-directional, so that you can pass in a byte-array (i.e., rowkey) from a table and then read the key elements back. Creation example... int elements[] = {RowKeySchema.SIZEOF_MD5_HASH, RowKeySchema.SIZEOF_INT, RowKeySchema.SIZEOF_LONG}; RowKeySchema schema = new RowKeySchema(elements); RowKey rowkey = schema.createRowKey(); rowkey.setHash(0, hashVal); rowkey.setInt(1, intVal); rowkey.setLong(2, longVal); byte bytes[] = rowkey.getBytes();
          Hide
          Doug Meil added a comment -

          I am overhauling this with some new ideas. Stay tuned.

          Show
          Doug Meil added a comment - I am overhauling this with some new ideas. Stay tuned.
          Hide
          Doug Meil added a comment -

          One more thought on size: then again, I could do what ArrayList does with it's overloaded constructor - use that size initially, and then auto-size if needed. But you could still define the exact size if you wanted for performance purposes. that's probably the nicest possible approach.

          Show
          Doug Meil added a comment - One more thought on size: then again, I could do what ArrayList does with it's overloaded constructor - use that size initially, and then auto-size if needed. But you could still define the exact size if you wanted for performance purposes. that's probably the nicest possible approach.
          Hide
          Doug Meil added a comment -

          re: "Builder"

          Yeah, I really wasn't going for a builder pattern. Elliott had a concern about the name "RowKey" (I must admit I'm still partial to it because there isn't a class with that name anywhere in the codebase).

          I wasn't really aiming for a builder pattern in the first place because I didn't want to necessarily force people to destroy and re-create the RowKey/Builder for each rowkey they create - that's why the reset method is there. The only thing that would have to get reset was the backing byte array.

          re: "fixed size"

          I wanted any particular instance to have a fixed size so that the backing byte-array didn't have resize like an ArrayList (and wind up burning a lot of byte-arrays in the process). So it's "easier" to create rowkeys than without the utility, but not without required thought.

          If your table had multiple length keys, there's nothing wrong with creating 2 different instances, one for each length.

          That's where I was coming from.

          re: "formatting"

          I'll fix that. Doh!

          Thanks!

          Show
          Doug Meil added a comment - re: "Builder" Yeah, I really wasn't going for a builder pattern. Elliott had a concern about the name "RowKey" (I must admit I'm still partial to it because there isn't a class with that name anywhere in the codebase). I wasn't really aiming for a builder pattern in the first place because I didn't want to necessarily force people to destroy and re-create the RowKey/Builder for each rowkey they create - that's why the reset method is there. The only thing that would have to get reset was the backing byte array. re: "fixed size" I wanted any particular instance to have a fixed size so that the backing byte-array didn't have resize like an ArrayList (and wind up burning a lot of byte-arrays in the process). So it's "easier" to create rowkeys than without the utility, but not without required thought. If your table had multiple length keys, there's nothing wrong with creating 2 different instances, one for each length. That's where I was coming from. re: "formatting" I'll fix that. Doh! Thanks!
          Hide
          stack added a comment -

          Hey boss... fix the formatting. Its all over the place – see the patch. You have tabs in there? If you look at other code you'll see it uses spaces for tabs and two spaces at that.

          Builder is a good name but you don't seem to follow the general builder pattern... i.e. get a 'builder', then do things against it and on the end call 'build' to return the result. That could confuse (You have some of it w/ your static to create an instance...)

          You keep adding to the backing array... just let the ArrayOutOfBounds happen if they try to add off the end?

          Why does the key have to be of fixed size?

          Good stuff.

          Show
          stack added a comment - Hey boss... fix the formatting. Its all over the place – see the patch. You have tabs in there? If you look at other code you'll see it uses spaces for tabs and two spaces at that. Builder is a good name but you don't seem to follow the general builder pattern... i.e. get a 'builder', then do things against it and on the end call 'build' to return the result. That could confuse (You have some of it w/ your static to create an instance...) You keep adding to the backing array... just let the ArrayOutOfBounds happen if they try to add off the end? Why does the key have to be of fixed size? Good stuff.
          Hide
          Doug Meil added a comment -

          Moved from server to common, renamed class name to RowKeyBuilder. (7221_2.patch)

          Show
          Doug Meil added a comment - Moved from server to common, renamed class name to RowKeyBuilder. (7221_2.patch)
          Hide
          Doug Meil added a comment -

          Gotcha, I'll re-submit with those changes.

          As an aside, that "Keying" class I cited above should move to common too. That's clearly a client utility and shouldn't be in server.

          Show
          Doug Meil added a comment - Gotcha, I'll re-submit with those changes. As an aside, that "Keying" class I cited above should move to common too. That's clearly a client utility and shouldn't be in server.
          Hide
          Elliott Clark added a comment -

          Common's much better than server, so If you're against examples then common seems the best.

          RowKeyBuilder seems better. Thanks

          Show
          Elliott Clark added a comment - Common's much better than server, so If you're against examples then common seems the best. RowKeyBuilder seems better. Thanks
          Hide
          Doug Meil added a comment -

          re: "classpath"

          Ahhh... good point. hbase-common, then?

          re: name.

          RowKeyBuilder?

          Show
          Doug Meil added a comment - re: "classpath" Ahhh... good point. hbase-common, then? re: name. RowKeyBuilder?
          Hide
          Elliott Clark added a comment -

          So coming soon (HBASE-7012) most users won't have anything in hbase-server on their client classpath. With that in mind it, it seems like hbase-examples will actually have more visibility for people just starting out with HBase. (hadoop-examples seems like a great corollary here; more starting users read that code than hadoop-common)

          Also should this be renamed to something like CompositeRowKey so that the name isn't confused with a row key that is used inside the hbase-server ?

          Show
          Elliott Clark added a comment - So coming soon ( HBASE-7012 ) most users won't have anything in hbase-server on their client classpath. With that in mind it, it seems like hbase-examples will actually have more visibility for people just starting out with HBase. (hadoop-examples seems like a great corollary here; more starting users read that code than hadoop-common) Also should this be renamed to something like CompositeRowKey so that the name isn't confused with a row key that is used inside the hbase-server ?
          Hide
          Doug Meil added a comment -

          I do think, though, that some code examples using this would be a good idea in "examples" but this class belongs in the util package like the other class. I've heard that request from some folks about having more "cookbook" examples.

          Show
          Doug Meil added a comment - I do think, though, that some code examples using this would be a good idea in "examples" but this class belongs in the util package like the other class. I've heard that request from some folks about having more "cookbook" examples.
          Hide
          Doug Meil added a comment -

          Elliot, the reason I'd lobby not for "examples" is that this is such a common question and so easy to screw up unless you really know what you're doing. The people that used HBase back when it was 0.20 were adventurers, as the release gets closer to 1.0 it just needs to make the right thing happen.

          Additionally, there are other key-utilities in the util package already...

          http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Keying.html

          Show
          Doug Meil added a comment - Elliot, the reason I'd lobby not for "examples" is that this is such a common question and so easy to screw up unless you really know what you're doing. The people that used HBase back when it was 0.20 were adventurers, as the release gets closer to 1.0 it just needs to make the right thing happen. Additionally, there are other key-utilities in the util package already... http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Keying.html
          Hide
          Doug Meil added a comment -

          Ramkrishna, regarding addHash, there is addHash(int), addHash(long), addHash(byte[]), and addHash(String). So the API already supports adding the "common" datatypes, plus a catch-all for a byte[]. What did you feel needed to be added?

          Personally, for the common "key" datatypes, I'd rather not force people to call Bytes.toBytes on them as I think that's kind of clunky.

          Show
          Doug Meil added a comment - Ramkrishna, regarding addHash, there is addHash(int), addHash(long), addHash(byte[]), and addHash(String). So the API already supports adding the "common" datatypes, plus a catch-all for a byte[]. What did you feel needed to be added? Personally, for the common "key" datatypes, I'd rather not force people to call Bytes.toBytes on them as I think that's kind of clunky.
          Hide
          Elliott Clark added a comment -

          Seems like this would be better in the hbase-examples module.

          Show
          Elliott Clark added a comment - Seems like this would be better in the hbase-examples module.
          Hide
          ramkrishna.s.vasudevan added a comment -
          addHash(s.getBytes());
          

          Can we make this too as 'addHash(Bytes.toBytes(s));?

          Show
          ramkrishna.s.vasudevan added a comment - addHash(s.getBytes()); Can we make this too as 'addHash(Bytes.toBytes(s));?
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12554921/HBASE_7221.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

          -1 javadoc. The javadoc tool appears to have generated 98 warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          -1 findbugs. The patch appears to introduce 27 new Findbugs (version 1.3.9) warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed unit tests in .

          Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//testReport/
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
          Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12554921/HBASE_7221.patch against trunk revision . +1 @author . The patch does not contain any @author tags. +1 tests included . The patch appears to include 3 new or modified tests. +1 hadoop2.0 . The patch compiles against the hadoop 2.0 profile. -1 javadoc . The javadoc tool appears to have generated 98 warning messages. +1 javac . The applied patch does not increase the total number of javac compiler warnings. -1 findbugs . The patch appears to introduce 27 new Findbugs (version 1.3.9) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 core tests . The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3406//console This message is automatically generated.

            People

            • Assignee:
              Doug Meil
              Reporter:
              Doug Meil
            • Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development