From f7a98bb8f98fef1569d7df153fb1906f4dc68e9c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E6=9D=8E=E5=B0=8F=E4=BF=9D?= Date: Tue, 8 Jan 2019 13:52:53 +0800 Subject: [PATCH] add why shuold avoid too many column families's reason and add a version note --- src/main/asciidoc/_chapters/schema_design.adoc | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc index fdbd18468c..b43d0db559 100644 --- a/src/main/asciidoc/_chapters/schema_design.adoc +++ b/src/main/asciidoc/_chapters/schema_design.adoc @@ -127,8 +127,12 @@ ____ == On the number of column families HBase currently does not do well with anything above two or three column families so keep the number of column families in your schema low. -Currently, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed even though the amount of data they carry is small. -When many column families exist the flushing and compaction interaction can make for a bunch of needless i/o (To be addressed by changing flushing and compaction to work on a per column family basis). For more information on compactions, see <>. +Before HBase version 1.1, flushing and compactions are done on a per Region basis so if one column family is carrying the bulk of the data bringing on flushes, the adjacent families will also be flushed even though the amount of data they carry is small. +When many column families exist the flushing and compaction interaction can make for a bunch of needless i/o (To be addressed by changing flushing and compaction to work on a per column family basis). From HBase version 1.2,flushing and compactions are done on a per Column Family basis. For more information on compactions, see <>. + +HDFS has a limit on the number of files in a directory (dfs.namenode.fs-limits.max-directory-items).If your table has N regions, M column families you will need N*M directories to support this configuration. Every region/column family, in turn, can contain up to K store files (depends on write load and many other configuration options),the number of files in the directory will be N*M*K,which may have an effect on the operation of the HDFS. + +A column family corresponds to a MemStore in RegionServer. HBase introduced MSLAB (Memstore-Local Allocation Buffers, reference HBASE-3455) from version 0.90.1. This function is enabled by default (via hbase.hregion.memstore.mslab.enabled), which makes each MemStore in memory takes up 2MB of buffer (configured by hbase.hregion.memstore.mslab.chunksize). If we have a lot of column families, then the MemStore cache will take up a lot of memory. Try to make do with one column family if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. -- 2.14.1