From 048206b139f7a2640f3621c814182fc880557e7f Mon Sep 17 00:00:00 2001 From: Yiming Liu Date: Thu, 8 Sep 2016 10:47:18 +0800 Subject: KYLIN-2000: Doc for compression settings --- website/_docs15/install/advance_settings.md | 41 ++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 7 deletions(-) diff --git a/website/_docs15/install/advance_settings.md b/website/_docs15/install/advance_settings.md index b2f4f04..284d4fb 100644 --- a/website/_docs15/install/advance_settings.md +++ b/website/_docs15/install/advance_settings.md @@ -7,19 +7,46 @@ permalink: /docs15/install/advance_settings.html ## Settings for compression -### Disable compression (Easiest way to address compression related issues) +By default, Kylin does not enable compression, this is not the recommend settings for production environment, but a tradeoff for new Kylin users. A suitable compression algorithm will reduce the storage overhead. But unsupported algorithm will break the Kylin job build also. There are three kinds of compression used in Kylin, HBase table compression, Hive output compression and MR jobs output compression. Compression settings only take effect after restarting Kylin server instance (by `./kylin.sh start` and `./kylin.sh stop`). -By default Kylin leverages snappy compression to compress the output of MR jobs, as well as hbase table storage, to reduce the storage overhead. We do not choose LZO compression in Kylin because hadoop venders tend to not include LZO in their distributions due to license(GPL) issues. If you compression related issues happened in your cubing job, you have two options: 1. Disable compression 2. Choose other compression algorithms like LZO. +### HBase table compression +The compression settings define in `kyiln.properties` by `kylin.hbase.default.compression.codec`, default value is *none*. The valid value includes *none*, *snappy*, *lzo*, *gzip* and *lz4*. Before changing the compression algorithm, please make sure the selected algorithm is supported on your HBase cluster. Especially for snappy, lzo and lz4, not all Hadoop distributions include these. -#### Disable HBase compression +### Hive output compression +The compression settings define in `kylin_hive_conf.xml`. The default setting is empty which leverages the Hive default configuration. If you want to override the settings, please add (or replace) the following properties into `kylin_hive_conf.xml`. Take the snappy compression for example: +{% highlight Groff markup %} + + mapreduce.map.output.compress.codec + org.apache.hadoop.io.compress.SnappyCodec + + + + mapreduce.output.fileoutputformat.compress.codec + org.apache.hadoop.io.compress.SnappyCodec + + +{% endhighlight %} -Compression settings only take effect after restarting Kylin server instance (by `./kylin.sh start` and `./kylin.sh stop`). To disable compressing MR jobs you need to modify $KYLIN_HOME/conf/kylin_job_conf.xml by removing all configuration entries related to compression(Just grep the keyword "compress"). To disable compressing hbase tables you need to open $KYLIN_HOME/conf/kylin.properties and remove the line starting with kylin.hbase.default.compression.codec. +### MR jobs output compression +The compression settings define in `kylin_job_conf.xml`. The default setting is empty which leverages the MR default configuration. If you want to override the settings, please add (or replace) the following properties into `kylin_job_conf.xml`. Take the snappy compression for example: +{% highlight Groff markup %} + + mapreduce.map.output.compress.codec + org.apache.hadoop.io.compress.SnappyCodec + + + + mapreduce.output.fileoutputformat.compress.codec + org.apache.hadoop.io.compress.SnappyCodec + + +{% endhighlight %} ### LZO compression #### Make sure LZO is working in your environment -We have a simple tool to test whether LZO is well installed on EVERY SERVER in hbase cluster ( http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/bk_installing_manually_book/content/ch_install_hdfs_yarn_chapter.html#install-snappy-man-install ), and restart the cluster. +There is a simple tool to test whether LZO is well installed on EVERY SERVER in HBase cluster ( http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.2.4/bk_installing_manually_book/content/ch_install_hdfs_yarn_chapter.html#install-snappy-man-install ), and restart the cluster. To test it on the hadoop CLI that you deployed Kylin, Just run {% highlight Groff markup %} @@ -27,7 +54,7 @@ hbase org.apache.hadoop.hbase.util.CompressionTest file:///PATH-TO-A-LOCAL-TMP-F {% endhighlight %} If no exception is printed, you're good to go. Otherwise you'll need to first install LZO properly on this server. -To test if the hbase cluster is ready to create LZO compressed tables, test following hbase command: +To test if the HBase cluster is ready to create LZO compressed tables, test following HBase command: {% highlight Groff markup %} create 'lzoTable', {NAME => 'colFam',COMPRESSION => 'LZO'} @@ -35,7 +62,7 @@ create 'lzoTable', {NAME => 'colFam',COMPRESSION => 'LZO'} #### Use LZO compression -Compression settings only take effect after restarting Kylin server instance (by `./kylin.sh start` and `./kylin.sh stop`). To use LZO for compressing MR jobs you need to modify $KYLIN_HOME/conf/kylin_job_conf.xml by replacing configuration entries related to compression from org.apache.hadoop.io.compress.SnappyCodec to com.hadoop.compression.lzo. You can refer to other documents for details: http://xiaming.me/posts/2014/05/03/enable-lzo-compression-on-hadoop-pig-and-spark. To use LZO for compressing hbase tables you need to open $KYLIN_HOME/conf/kylin.properties, change kylin.hbase.default.compression.codec=snappy to kylin.hbase.default.compression.codec=lzo. +Compression settings only take effect after restarting Kylin server instance (by `./kylin.sh start` and `./kylin.sh stop`). To use LZO for compressing MR jobs you need to modify $KYLIN_HOME/conf/kylin_job_conf.xml by replacing compression configuration entries to com.hadoop.compression.lzo. You can refer to other documents for details: http://xiaming.me/posts/2014/05/03/enable-lzo-compression-on-hadoop-pig-and-spark. To use LZO for compressing HBase tables you need to open $KYLIN_HOME/conf/kylin.properties, change kylin.hbase.default.compression.codec=lzo. ## Enable LDAP or SSO authentication -- 2.7.4 (Apple Git-66)