diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc index 9319c65..55ad672 100644 --- a/src/main/asciidoc/_chapters/schema_design.adoc +++ b/src/main/asciidoc/_chapters/schema_design.adoc @@ -27,7 +27,19 @@ :icons: font :experimental: -A good general introduction on the strength and weaknesses modelling on the various non-rdbms datastores is Ian Varley's Master thesis, link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases]. Also, read <> for how HBase stores data internally, and the section on <>. +A good introduction on the strength and weaknesses modelling on the various non-rdbms datastores is +to be found in Ian Varley's Master thesis, +link:http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf[No Relation: The Mixed Blessings of Non-Relational Databases]. +It is a little dated now but a good background read if you have a moment on how HBase schema modeling +differs from how it is done in an RDBMS. Also, +read <> for how HBase stores data internally, and the section on <>. + +The documentation on the Cloud Bigtable website, link:https://cloud.google.com/bigtable/docs/schema-design[Designing Your Schema], +is pertinent and nicely done and lessons learned there equally apply here in HBase land; just divide +any quoted values by ~10 to get what works for HBase: e.g. where it says individual values can be ~10MBs in size, HBase can do similar -- perhaps best +to go smaller if you can -- and where it says a maximum of 100 column families in Cloud Bigtable, think ~10 when +modeling on HBase. + [[schema.creation]] == Schema Creation