Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.99.0, 2.0.0
    • Component/s: documentation
    • Labels:
      None
    • Hadoop Flags:
      Reviewed
    1. HBASE-11682-1.patch
      3 kB
      Misty Stanley-Jones
    2. HBASE-11682.patch
      2 kB
      Misty Stanley-Jones
    3. HBASE-11682.patch
      4 kB
      Misty Stanley-Jones
    4. HBASE-11682.patch
      6 kB
      Misty Stanley-Jones
    5. HBASE-11682.patch
      6 kB
      Misty Stanley-Jones
    6. HBASE-11682.patch
      6 kB
      Misty Stanley-Jones

      Activity

      Hide
      Misty Stanley-Jones added a comment -

      This is my understanding of hotspotting. I didn't find mention of it in the Ref Guide anywhere even though it is an important issue.

      Show
      Misty Stanley-Jones added a comment - This is my understanding of hotspotting. I didn't find mention of it in the Ref Guide anywhere even though it is an important issue.
      Hide
      Nick Dimiduk added a comment -

      HBase also attempts to store rows near each other in the same region, on the same region server.

      This sentence doesn't help much. A region is a contiguous sequence of rows that are physically hosted as a unit. Rows on region boundaries are lexicographically near each other but are part of different regions, so there are no guarantees about them being hosted on the same region server.

      However, poorly designed row keys can lead to <firstterm>hotspotting</firstterm>.

      This is where schema/rowkey design and access patterns go hand-in-hand.

      Hotspotting occurs when nearly all the rows being written to HBase are written to the same region, because their row keys are contiguous or very similar.

      I'd say "Hotspotting occurs when too much client traffic is directed at a single region. This can be from reads, writes, or both. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load."

      but in the bigger picture, data is being written to multiple regions across the cluster ...

      Again, not limited to writes.

      One technique is to salt the row keys

      Is the term "salt" explained?

      However, using totally random row keys would remove any benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or scan would need to query all regions.

      You're assuming a sequential access pattern here. Random rowkeys can be okay for random read access patterns, in that load is spread all over the cluster. I've seen other issues around poor blockcache performance from completely random access patterns, but that's a slight tangent.

      Show
      Nick Dimiduk added a comment - HBase also attempts to store rows near each other in the same region, on the same region server. This sentence doesn't help much. A region is a contiguous sequence of rows that are physically hosted as a unit. Rows on region boundaries are lexicographically near each other but are part of different regions, so there are no guarantees about them being hosted on the same region server. However, poorly designed row keys can lead to <firstterm>hotspotting</firstterm>. This is where schema/rowkey design and access patterns go hand-in-hand. Hotspotting occurs when nearly all the rows being written to HBase are written to the same region, because their row keys are contiguous or very similar. I'd say "Hotspotting occurs when too much client traffic is directed at a single region. This can be from reads, writes, or both. The traffic overwhelms the single machine responsible for hosting that region, causing performance degradation and potentially leading to region unavailability. This can also have adverse effects on other regions hosted by the same region server as that host is unable to service the requested load." but in the bigger picture, data is being written to multiple regions across the cluster ... Again, not limited to writes. One technique is to salt the row keys Is the term "salt" explained? However, using totally random row keys would remove any benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or scan would need to query all regions. You're assuming a sequential access pattern here. Random rowkeys can be okay for random read access patterns, in that load is spread all over the cluster. I've seen other issues around poor blockcache performance from completely random access patterns, but that's a slight tangent.
      Hide
      Misty Stanley-Jones added a comment -

      Thanks for the feedback, I took it on board and created a new patch. I also added a couple links to articles about using a salt with row keys.

      Show
      Misty Stanley-Jones added a comment - Thanks for the feedback, I took it on board and created a new patch. I also added a couple links to articles about using a salt with row keys.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12660036/HBASE-11682.patch
      against trunk revision .
      ATTACHMENT ID: 12660036

      +1 @author. The patch does not contain any @author tags.

      +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      +1 lineLengths. The patch does not introduce lines longer than 100

      +1 site. The mvn site goal succeeds with this patch.

      -1 core tests. The patch failed these unit tests:
      org.apache.hadoop.hbase.regionserver.TestRegionReplicas
      org.apache.hadoop.hbase.master.TestRestartCluster
      org.apache.hadoop.hbase.client.TestReplicasClient
      org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas

      -1 core zombie tests. There are 1 zombie test(s): at org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:250)

      Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//testReport/
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
      Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660036/HBASE-11682.patch against trunk revision . ATTACHMENT ID: 12660036 +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. +1 lineLengths . The patch does not introduce lines longer than 100 +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicas org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.client.TestReplicasClient org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas -1 core zombie tests . There are 1 zombie test(s): at org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:250) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10309//console This message is automatically generated.
      Hide
      Hadoop QA added a comment -

      -1 overall. Here are the results of testing the latest attachment
      http://issues.apache.org/jira/secure/attachment/12660043/HBASE-11682-1.patch
      against trunk revision .
      ATTACHMENT ID: 12660043

      +1 @author. The patch does not contain any @author tags.

      +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javac. The applied patch does not increase the total number of javac compiler warnings.

      +1 javadoc. The javadoc tool did not generate any warning messages.

      +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

      +1 release audit. The applied patch does not increase the total number of release audit warnings.

      -1 lineLengths. The patch introduces the following lines longer than 100:
      + xlink:href="http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/"
      + >http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/</link>.
      + xlink:href="https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables"
      + >https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables</link>.

      +1 site. The mvn site goal succeeds with this patch.

      -1 core tests. The patch failed these unit tests:
      org.apache.hadoop.hbase.regionserver.TestRegionReplicas
      org.apache.hadoop.hbase.master.TestRestartCluster
      org.apache.hadoop.hbase.client.TestReplicasClient
      org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas

      Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//testReport/
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
      Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
      Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//console

      This message is automatically generated.

      Show
      Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660043/HBASE-11682-1.patch against trunk revision . ATTACHMENT ID: 12660043 +1 @author . The patch does not contain any @author tags. +0 tests included . The patch appears to be a documentation patch that doesn't require tests. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javac . The applied patch does not increase the total number of javac compiler warnings. +1 javadoc . The javadoc tool did not generate any warning messages. +1 findbugs . The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit . The applied patch does not increase the total number of release audit warnings. -1 lineLengths . The patch introduces the following lines longer than 100: + xlink:href="http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/" + > http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ </link>. + xlink:href="https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables" + > https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/11/10/discussion-on-designing-hbase-tables </link>. +1 site . The mvn site goal succeeds with this patch. -1 core tests . The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicas org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.client.TestReplicasClient org.apache.hadoop.hbase.master.TestMasterOperationsForRegionReplicas Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/10312//console This message is automatically generated.
      Hide
      Jonathan Hsieh added a comment -

      Nice addition. Personally, I don't really like the sematext definition of salting it conflates salting [1] with hashing[2] which are two separate things.

      salting adds random data to the start of a rowkey. this means depending on the 'salt factor' you could end up writing to n different row keys (and ideally n different regions). When reading you would generally want to read all n rows and coalesce the values. This is helpful if you have individual hot keys. It is often a bad smell because it is a trick used to try to mitigate having the date as the first part of a row key, but does have valid use cases. (ex: rowkey is name and you have a handful of individual celebreties - obama, bieber, gaga - that need to have their load spread). This preserves ordering but multiplies the number of reads required wrt # of writes.

      hashing applies a random one way function to the rowkey such that a particular row will get the same 'random' value prepended. The original row would get mapped to a single row. This is good for when you have clusters of related keys that in aggregate form a hotspot. (Example: rowkey is name and you have way to many joe's, john's, jon's, jonah's, jonathan's, and jonathons's all on the same region – using a hash would spread all the j names around). this throws out the ability to effectively take advantage of the row ordering properties.

      Another trick is to take numeric or fixed length values and make the least significant digit (e.g. the one that changes the most) in least significant digit order (little endian). This effectively randomizes row key names but also sacrifices row ordering properties.

      [1] http://en.wikipedia.org/wiki/Salt_(cryptography)
      [2] http://en.wikipedia.org/wiki/Hash_function

      Show
      Jonathan Hsieh added a comment - Nice addition. Personally, I don't really like the sematext definition of salting it conflates salting [1] with hashing [2] which are two separate things. salting adds random data to the start of a rowkey. this means depending on the 'salt factor' you could end up writing to n different row keys (and ideally n different regions). When reading you would generally want to read all n rows and coalesce the values. This is helpful if you have individual hot keys. It is often a bad smell because it is a trick used to try to mitigate having the date as the first part of a row key, but does have valid use cases. (ex: rowkey is name and you have a handful of individual celebreties - obama, bieber, gaga - that need to have their load spread). This preserves ordering but multiplies the number of reads required wrt # of writes. hashing applies a random one way function to the rowkey such that a particular row will get the same 'random' value prepended. The original row would get mapped to a single row. This is good for when you have clusters of related keys that in aggregate form a hotspot. (Example: rowkey is name and you have way to many joe's, john's, jon's, jonah's, jonathan's, and jonathons's all on the same region – using a hash would spread all the j names around). this throws out the ability to effectively take advantage of the row ordering properties. Another trick is to take numeric or fixed length values and make the least significant digit (e.g. the one that changes the most) in least significant digit order (little endian). This effectively randomizes row key names but also sacrifices row ordering properties. [1] http://en.wikipedia.org/wiki/Salt_(cryptography ) [2] http://en.wikipedia.org/wiki/Hash_function
      Hide
      Nick Dimiduk added a comment -

      Please note that "salting" as we use the term in HBase is different from "salting" in the cryptographic sense. Our usage pattern is more accurately described as "bucketing" (I think this is the term Phoenix uses), or "binning" [0]. This horse has been beaten to death on the user and dev mailing lists, so I won't belabor the point.

      [0]: http://en.wikipedia.org/wiki/Data_binning

      Show
      Nick Dimiduk added a comment - Please note that "salting" as we use the term in HBase is different from "salting" in the cryptographic sense. Our usage pattern is more accurately described as "bucketing" (I think this is the term Phoenix uses), or "binning" [0] . This horse has been beaten to death on the user and dev mailing lists, so I won't belabor the point. [0] : http://en.wikipedia.org/wiki/Data_binning
      Hide
      Nick Dimiduk added a comment -

      However, using totally random row keys for data which is accessed sequentially would remove the benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or scan would need to query all regions.

      Prefixing with random byte prevents any meaningful use of scans; gets become your only option. This approach is indistinguishable from hashing the rowkey.

      I like the rest of the updates, +1

      Thanks a lot Misty Stanley-Jones!

      Show
      Nick Dimiduk added a comment - However, using totally random row keys for data which is accessed sequentially would remove the benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or scan would need to query all regions. Prefixing with random byte prevents any meaningful use of scans; gets become your only option. This approach is indistinguishable from hashing the rowkey. I like the rest of the updates, +1 Thanks a lot Misty Stanley-Jones !
      Hide
      Jonathan Hsieh added a comment -

      Please note that "salting" as we use the term in HBase is different from "salting" in the cryptographic sense. Our usage pattern is more accurately described as "bucketing" (I think this is the term Phoenix uses), or "binning" [0]. This horse has been beaten to death on the user and dev mailing lists, so I won't belabor the point.

      I agree they are different but the idea of a random (as opposed to a deterministic hash) prepended is very similar to the crypto "salting". Instead of using "salting" how about we use the term "striping"? What I've was referring and described takes one logical row and stripes it across many rows so that write throughput can be increased. The penalty is that we loose some consistency and also if we want a "whole" answer read all the rows the logical rows was striped over. I've seen the pattern deployed at several customers.

      From what phoenix uses and what I think you mean, "bucketing', and "binning" are equivalent to prepending a hash.

      Show
      Jonathan Hsieh added a comment - Please note that "salting" as we use the term in HBase is different from "salting" in the cryptographic sense. Our usage pattern is more accurately described as "bucketing" (I think this is the term Phoenix uses), or "binning" [0] . This horse has been beaten to death on the user and dev mailing lists, so I won't belabor the point. I agree they are different but the idea of a random (as opposed to a deterministic hash) prepended is very similar to the crypto "salting". Instead of using "salting" how about we use the term "striping"? What I've was referring and described takes one logical row and stripes it across many rows so that write throughput can be increased. The penalty is that we loose some consistency and also if we want a "whole" answer read all the rows the logical rows was striped over. I've seen the pattern deployed at several customers. From what phoenix uses and what I think you mean, "bucketing', and "binning" are equivalent to prepending a hash.
      Hide
      Nick Dimiduk added a comment -

      From what phoenix uses and what I think you mean, "bucketing', and "binning" are equivalent to prepending a hash.

      Yes, that's consistent with my understanding as well. I'd prefer not to introduce yet another term to the conversation (or add confusion with striped compactions). Even though it's not perfect, I think it best to stick with "salt".

      Show
      Nick Dimiduk added a comment - From what phoenix uses and what I think you mean, "bucketing', and "binning" are equivalent to prepending a hash. Yes, that's consistent with my understanding as well. I'd prefer not to introduce yet another term to the conversation (or add confusion with striped compactions). Even though it's not perfect, I think it best to stick with "salt".
      Hide
      Jonathan Hsieh added a comment -

      Even though it's not perfect, I think it best to stick with "salt".

      I'm not sure if we are on the same page here – my main point is that "salt" != "hash". Also, "salt" != "Prepend a hash". If the term salt causes confusion, my suggestion is not to use it.

      The other mechanisms mentioned (striping/prepened a rand and endian inversion) are valid strategies to avoid hotspotting as well.

      Show
      Jonathan Hsieh added a comment - Even though it's not perfect, I think it best to stick with "salt". I'm not sure if we are on the same page here – my main point is that "salt" != "hash". Also, "salt" != "Prepend a hash". If the term salt causes confusion, my suggestion is not to use it. The other mechanisms mentioned (striping/prepened a rand and endian inversion) are valid strategies to avoid hotspotting as well.
      Hide
      Nick Dimiduk added a comment -

      From the conversations on user list I've seen, "salt" tends to mean "prepend with some fixed-byte-length value, usually a modulo of the number of regionservers" – the same as your "striping". I've also seen lazy people prepend with the first N bytes of the hashed rowkey, hence my loose language in the previous comment.

      Show
      Nick Dimiduk added a comment - From the conversations on user list I've seen, "salt" tends to mean "prepend with some fixed-byte-length value, usually a modulo of the number of regionservers" – the same as your "striping". I've also seen lazy people prepend with the first N bytes of the hashed rowkey, hence my loose language in the previous comment.
      Hide
      Jonathan Hsieh added a comment - - edited

      More precisely, for "salt"/"stripe" I've seen prepend some fixed byte-length rand value, were rand value is between [0, k * (# of regionservers) ] where k is some constant. I think that is what you mean but i'm not sure. If we agree we should have misty define salt in the prose instead of pointing to an external definition.

      Show
      Jonathan Hsieh added a comment - - edited More precisely, for "salt"/"stripe" I've seen prepend some fixed byte-length rand value, were rand value is between [0, k * (# of regionservers) ] where k is some constant. I think that is what you mean but i'm not sure. If we agree we should have misty define salt in the prose instead of pointing to an external definition.
      Hide
      Nick Dimiduk added a comment -

      I don't think we need to define how the salt bytes might be calculated. It should be enough to define the concept as prepending some with a defined cardinality to avoid overburdening a single machine. I have no problem with pointing off to an external resource as an example of how it might be done.

      Show
      Nick Dimiduk added a comment - I don't think we need to define how the salt bytes might be calculated. It should be enough to define the concept as prepending some with a defined cardinality to avoid overburdening a single machine. I have no problem with pointing off to an external resource as an example of how it might be done.
      Hide
      Jonathan Hsieh added a comment -

      sounds good to me. thanks for bearing with me on this.

      Show
      Jonathan Hsieh added a comment - sounds good to me. thanks for bearing with me on this.
      Hide
      Nick Dimiduk added a comment -

      thanks for bearing with me on this

      I was going to say the same

      Show
      Nick Dimiduk added a comment - thanks for bearing with me on this I was going to say the same
      Hide
      Misty Stanley-Jones added a comment -

      Thanks guys, I'll take this feedback onboard and submit something new tomorrow.

      Show
      Misty Stanley-Jones added a comment - Thanks guys, I'll take this feedback onboard and submit something new tomorrow.
      Hide
      Misty Stanley-Jones added a comment -

      Let me know how this version suits you guys. Thanks for the feedback.

      Show
      Misty Stanley-Jones added a comment - Let me know how this version suits you guys. Thanks for the feedback.
      Hide
      Jonathan Hsieh added a comment -
      +      <para>Salting in this sense has nothing to do with cryptography, but refers to adding random
      +        data to the start of a row key. In this case, salting refers to adding a prefix to the row
      +        key to cause it to sort differently than it otherwise would. Salting can be helpful if you
      +        have a few keys that come up over and over, along with other rows that don't fit those keys.
      +        In that case, the regions holding rows with the "hot" keys would be overloaded, compared to
      +        the other regions. Salting completely removes ordering, so is often a poorer choice than
      +        hashing. Using totally random row keys for data which is accessed sequentially would remove
      +        the benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or
      +        scan would need to query all regions.</para>
      

      I don't think this salting example is correct about the ramifications. Both Nick and I agree that salting is puting some random value in front of the actual value. This means instead of one sorted list of entries, we'd have many n sorted lists of entries if the cardinality of the salt is n.

      Example: naively we have rowkeys like this:

      foo0001
      foo0002
      foo0003
      foo0004

      if we us a 4 way salt (a,b,c,d), we could end up with data resorted like this:

      a-foo0003
      b-foo0001
      c-foo0004
      d-foo0002

      Let say we add some new values to row foo0003. It could get salted with a new salt, let's say 'c'.

      a-foo0003
      b-foo0001
      c-foo0003
      c-foo0004
      d-foo0002

      To read we still could get things read in the original order but we'd have to have a reader starting from each salt in parallel to get the rows back in order. (and likely need to do some coalescing of foo0003 to combine the a-foo0003 and c-foo0003 rows back into one. The effect here in this situtation is that we could be writing with 4x the throughput now since we would be on 4 different machines.(assuming that the a, b, c, d are balanced onto different machines).

      Nick's point of view (please correct me if I am wrong) says that you could "salt" the original row key with a one-way hash so that foo0003 would always get salted with 'a'. This would spread rowkeys that are lexicographically close (foo0001 and foo0002) to different machines that could help reduce contention and increase overall throughput but not allow ever allow a single row to have 4x the throughput like the other approach.

      +      <para>Hashing refers to applying a random one-way function to the row key, such that a
      +        particular row always gets the same arbitrary value applied. This preserves the sort order
      +        so that scans are effective, but spreads out load across a region. One example where hashing
      +        is the right strategy would be if for some reason, a large proportion of rows started with
      +        the same letter. Normally, these would all be sorted into the same region. You can apply a
      +        hash to artificially differentiate them and spread them out.</para>
      

      Hashing actually totally trashes the sort order – in fact the goal of hashing is to evenly disburse entries that are near each other lexicographically as much as possible.

      Show
      Jonathan Hsieh added a comment - + <para>Salting in this sense has nothing to do with cryptography, but refers to adding random + data to the start of a row key. In this case , salting refers to adding a prefix to the row + key to cause it to sort differently than it otherwise would. Salting can be helpful if you + have a few keys that come up over and over, along with other rows that don't fit those keys. + In that case , the regions holding rows with the "hot" keys would be overloaded, compared to + the other regions. Salting completely removes ordering, so is often a poorer choice than + hashing. Using totally random row keys for data which is accessed sequentially would remove + the benefit of HBase's row-sorting algorithm and cause very poor performance, as each get or + scan would need to query all regions.</para> I don't think this salting example is correct about the ramifications. Both Nick and I agree that salting is puting some random value in front of the actual value. This means instead of one sorted list of entries, we'd have many n sorted lists of entries if the cardinality of the salt is n. Example: naively we have rowkeys like this: foo0001 foo0002 foo0003 foo0004 if we us a 4 way salt (a,b,c,d), we could end up with data resorted like this: a-foo0003 b-foo0001 c-foo0004 d-foo0002 Let say we add some new values to row foo0003. It could get salted with a new salt, let's say 'c'. a-foo0003 b-foo0001 c-foo0003 c-foo0004 d-foo0002 To read we still could get things read in the original order but we'd have to have a reader starting from each salt in parallel to get the rows back in order. (and likely need to do some coalescing of foo0003 to combine the a-foo0003 and c-foo0003 rows back into one. The effect here in this situtation is that we could be writing with 4x the throughput now since we would be on 4 different machines.(assuming that the a, b, c, d are balanced onto different machines). Nick's point of view (please correct me if I am wrong) says that you could "salt" the original row key with a one-way hash so that foo0003 would always get salted with 'a'. This would spread rowkeys that are lexicographically close (foo0001 and foo0002) to different machines that could help reduce contention and increase overall throughput but not allow ever allow a single row to have 4x the throughput like the other approach. + <para>Hashing refers to applying a random one-way function to the row key, such that a + particular row always gets the same arbitrary value applied. This preserves the sort order + so that scans are effective, but spreads out load across a region. One example where hashing + is the right strategy would be if for some reason, a large proportion of rows started with + the same letter. Normally, these would all be sorted into the same region. You can apply a + hash to artificially differentiate them and spread them out.</para> Hashing actually totally trashes the sort order – in fact the goal of hashing is to evenly disburse entries that are near each other lexicographically as much as possible.
      Hide
      Misty Stanley-Jones added a comment -

      Thanks for the extra info, working on this now.

      Show
      Misty Stanley-Jones added a comment - Thanks for the extra info, working on this now.
      Hide
      Misty Stanley-Jones added a comment -

      Corrected my mistakes, added illustrative examples, and otherwise tried to improve this a bit. Let me know what you think.

      Show
      Misty Stanley-Jones added a comment - Corrected my mistakes, added illustrative examples, and otherwise tried to improve this a bit. Let me know what you think.
      Hide
      Jonathan Hsieh added a comment -

      +1. I like it.

      Before it gets committed, Nick Dimiduk, do you agree with the latest version? Misty went into a little more detail but I think it captures two different techniques to deal with hostspotting and their distinct effects.

      Show
      Jonathan Hsieh added a comment - +1. I like it. Before it gets committed, Nick Dimiduk , do you agree with the latest version? Misty went into a little more detail but I think it captures two different techniques to deal with hostspotting and their distinct effects.
      Hide
      Nick Dimiduk added a comment -

      Very well articulated example, I like it! Jonathan Hsieh you're right in that I don't think of using random data for a prefix because the nondeterminism makes gets ineffective. It is, however, a valid approach.

      +        <para>Suppose you have the following list of row keys:</para>
      

      This example assumes the table is split in a way such that f* would be in a single region but a-, b-, c-, d- are in different regions. Be explicit about the region splits, include a sentence like "assume your table is split by letter, so the rowkey prefix a is on one region, b is on a second, c on a 3rd, &c." In that topology, then all the foo rows would be in the same region, and the prefixed rows are in different regions.

      +        <title>Hashing</title>
      

      For this bit, you can add something like "using a deterministic hash allows the client to reconstruct the complete rowkey and use a get operation to retrieve that row as normal." The current text alludes to this, but maybe we can some out and say it explicitly.

      For references, you could also link off to Phoenix's "Salted Tables" description http://phoenix.apache.org/salted.html

      Show
      Nick Dimiduk added a comment - Very well articulated example, I like it! Jonathan Hsieh you're right in that I don't think of using random data for a prefix because the nondeterminism makes gets ineffective. It is, however, a valid approach. + <para>Suppose you have the following list of row keys:</para> This example assumes the table is split in a way such that f* would be in a single region but a-, b-, c-, d- are in different regions. Be explicit about the region splits, include a sentence like "assume your table is split by letter, so the rowkey prefix a is on one region, b is on a second, c on a 3rd, &c." In that topology, then all the foo rows would be in the same region, and the prefixed rows are in different regions. + <title>Hashing</title> For this bit, you can add something like "using a deterministic hash allows the client to reconstruct the complete rowkey and use a get operation to retrieve that row as normal." The current text alludes to this, but maybe we can some out and say it explicitly. For references, you could also link off to Phoenix's "Salted Tables" description http://phoenix.apache.org/salted.html
      Hide
      Jonathan Hsieh added a comment -

      +1 to NIck's clarifications

      Show
      Jonathan Hsieh added a comment - +1 to NIck's clarifications
      Hide
      Misty Stanley-Jones added a comment -

      Thanks Nick Dimiduk, how's this?

      Show
      Misty Stanley-Jones added a comment - Thanks Nick Dimiduk , how's this?
      Hide
      Nick Dimiduk added a comment -

      A little nit-picky, but... (now you know what Aman went through )

      +        <para>Suppose you have the following list of row keys, and your table is split in such a way
      +          that all the rows starting with "foo" are in the same region.</para>
      

      I would say "... and your table is split such that there is one region for each letter of the alphabet – prefix 'a' is one region, prefix 'b' is another. In this table, all rows starting with 'f' are in the same region."

      That is, be explicitly clear about the region split for the example.

      +        an <link xlink:href="http://phoenix.apache.org/salted.html">article on Salted Tables</link>
      

      "an" should be "and" ?

      Show
      Nick Dimiduk added a comment - A little nit-picky, but... (now you know what Aman went through ) + <para>Suppose you have the following list of row keys, and your table is split in such a way + that all the rows starting with "foo" are in the same region.</para> I would say "... and your table is split such that there is one region for each letter of the alphabet – prefix 'a' is one region, prefix 'b' is another. In this table, all rows starting with 'f' are in the same region." That is, be explicitly clear about the region split for the example. + an <link xlink:href="http://phoenix.apache.org/salted.html">article on Salted Tables</link> "an" should be "and" ?
      Hide
      Misty Stanley-Jones added a comment -

      Good feedback. Give this a try.

      Show
      Misty Stanley-Jones added a comment - Good feedback. Give this a try.
      Hide
      Nick Dimiduk added a comment -

      Misty Stanley-Jones you are a saint of patience.

      +1

      Show
      Nick Dimiduk added a comment - Misty Stanley-Jones you are a saint of patience. +1
      Hide
      Jonathan Hsieh added a comment -

      i noticed the wrong row got emphasized in the latest patch (c-foo0003 should be emphasized instead of c-foo0004) otherwise it looked good.

      Also noticed on the re-read that the reversing trick should have the least significant digit as the first part of the key. (ex: so 0001 0002 0003 would be "flipped" to be 1000 2000 and 3000 to end up on different servers).

      I'm going to fix both and commit.

      Show
      Jonathan Hsieh added a comment - i noticed the wrong row got emphasized in the latest patch (c-foo0003 should be emphasized instead of c-foo0004) otherwise it looked good. Also noticed on the re-read that the reversing trick should have the least significant digit as the first part of the key. (ex: so 0001 0002 0003 would be "flipped" to be 1000 2000 and 3000 to end up on different servers). I'm going to fix both and commit.
      Hide
      Jonathan Hsieh added a comment -

      Thanks for the patch misty, and thanks for the reviews nick!

      Committed to branch-1 and master.

      Show
      Jonathan Hsieh added a comment - Thanks for the patch misty, and thanks for the reviews nick! Committed to branch-1 and master.
      Hide
      Hudson added a comment -

      FAILURE: Integrated in HBase-1.0 #113 (See https://builds.apache.org/job/HBase-1.0/113/)
      HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev f4f77b5756464d3e48f686af18e92560e2a4f76b)

      • src/main/docbkx/schema_design.xml
      Show
      Hudson added a comment - FAILURE: Integrated in HBase-1.0 #113 (See https://builds.apache.org/job/HBase-1.0/113/ ) HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev f4f77b5756464d3e48f686af18e92560e2a4f76b) src/main/docbkx/schema_design.xml
      Hide
      Hudson added a comment -

      FAILURE: Integrated in HBase-TRUNK #5413 (See https://builds.apache.org/job/HBase-TRUNK/5413/)
      HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev ac2e1c33fd32a6b473ebbfdc32f5e631a69f2a6d)

      • src/main/docbkx/schema_design.xml
      Show
      Hudson added a comment - FAILURE: Integrated in HBase-TRUNK #5413 (See https://builds.apache.org/job/HBase-TRUNK/5413/ ) HBASE-11682 Explain Hotspotting (Misty Stanley-Jones) (jmhsieh: rev ac2e1c33fd32a6b473ebbfdc32f5e631a69f2a6d) src/main/docbkx/schema_design.xml
      Hide
      Enis Soztutar added a comment -

      Closing this issue after 0.99.0 release.

      Show
      Enis Soztutar added a comment - Closing this issue after 0.99.0 release.

        People

        • Assignee:
          Misty Stanley-Jones
          Reporter:
          Misty Stanley-Jones
        • Votes:
          0 Vote for this issue
          Watchers:
          7 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development