From 1bf3e72b69197749723ea2241899d2b09ff88e3b Mon Sep 17 00:00:00 2001 From: Sean Busbey Date: Thu, 5 Apr 2018 11:52:00 -0500 Subject: [PATCH] HBASE-20354 better docs for impact of proactively checking hsync support. * Add to the quickstart guide disabling the hsync check, with a big warning about how we'll lose data if the check is disabled. * Add to troubleshooting section so folks searching can get a pointer. --- src/main/asciidoc/_chapters/getting_started.adoc | 42 ++++++++++++++-- src/main/asciidoc/_chapters/shell.adoc | 6 +-- src/main/asciidoc/_chapters/troubleshooting.adoc | 62 ++++++++++++++++++++++++ 3 files changed, 104 insertions(+), 6 deletions(-) diff --git a/src/main/asciidoc/_chapters/getting_started.adoc b/src/main/asciidoc/_chapters/getting_started.adoc index 9e0bbdfce0..d8de4208cd 100644 --- a/src/main/asciidoc/_chapters/getting_started.adoc +++ b/src/main/asciidoc/_chapters/getting_started.adoc @@ -82,7 +82,7 @@ JAVA_HOME=/usr + . Edit _conf/hbase-site.xml_, which is the main HBase configuration file. - At this time, you only need to specify the directory on the local filesystem where HBase and ZooKeeper write data. + At this time, you need to specify the directory on the local filesystem where HBase and ZooKeeper write data and acknowledge some risks. By default, a new directory is created under /tmp. Many servers are configured to delete the contents of _/tmp_ upon reboot, so you should store the data elsewhere. The following configuration will store HBase's data in the _hbase_ directory, in the home directory of the user called `testuser`. @@ -102,6 +102,19 @@ JAVA_HOME=/usr hbase.zookeeper.property.dataDir /home/testuser/zookeeper + + hbase.unsafe.stream.capability.enforce + false + + Controls whether HBase will check for stream capabilities (hflush/hsync). + + Disable this if you intend to run on LocalFileSystem, denoted by a rootdir + with the 'file://' scheme. + + WARNING: Setting this to false exposes you to data loss and inconsistent + system state in the event of process and/or node failures. + + ---- ==== @@ -111,7 +124,14 @@ HBase will do this for you. If you create the directory, HBase will attempt to do a migration, which is not what you want. + NOTE: The _hbase.rootdir_ in the above example points to a directory -in the _local filesystem_. The 'file:/' prefix is how we denote local filesystem. +in the _local filesystem_. The 'file://' prefix is how we denote local +filesystem. You should take the WARNING present in the configuration example +to heart. In standalone mode HBase makes use of the local filesystem abstraction +from the Apache Hadoop project. That abstraction doesn't provide the durability +promises that HBase needs to operate safely. This is fine for local development +and testing use cases where the cost of cluster failure is well contained. It is +not appropriate for production deployments; eventually you will lose data. + To home HBase on an existing instance of HDFS, set the _hbase.rootdir_ to point at a directory up on your instance: e.g. _hdfs://namenode.example.org:8020/hbase_. For more on this variant, see the section below on Standalone HBase over HDFS. @@ -163,7 +183,7 @@ hbase(main):001:0> create 'test', 'cf' . List Information About your Table + -Use the `list` command to +Use the `list` command to confirm your table exists + ---- hbase(main):002:0> list 'test' @@ -174,6 +194,22 @@ test => ["test"] ---- ++ +Now use the `describe` command to see details, including configuration defaults ++ +---- +hbase(main):003:0> describe 'test' +Table test is ENABLED +test +COLUMN FAMILIES DESCRIPTION +{NAME => 'cf', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => +'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'f +alse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE + => '65536'} +1 row(s) +Took 0.9998 seconds +---- + . Put data into your table. + To put data into your table, use the `put` command. diff --git a/src/main/asciidoc/_chapters/shell.adoc b/src/main/asciidoc/_chapters/shell.adoc index 522f482fc0..1392e92f65 100644 --- a/src/main/asciidoc/_chapters/shell.adoc +++ b/src/main/asciidoc/_chapters/shell.adoc @@ -227,7 +227,7 @@ The table reference can be used to perform data read write operations such as pu For example, previously you would always specify a table name: ---- -hbase(main):000:0> create ‘t’, ‘f’ +hbase(main):000:0> create 't', 'f' 0 row(s) in 1.0970 seconds hbase(main):001:0> put 't', 'rold', 'f', 'v' 0 row(s) in 0.0080 seconds @@ -291,7 +291,7 @@ hbase(main):012:0> tab = get_table 't' 0 row(s) in 0.0010 seconds => Hbase::Table - t -hbase(main):013:0> tab.put ‘r1’ ,’f’, ‘v’ +hbase(main):013:0> tab.put 'r1' ,'f', 'v' 0 row(s) in 0.0100 seconds hbase(main):014:0> tab.scan ROW COLUMN+CELL @@ -305,7 +305,7 @@ You can then use jruby to script table operations based on these names. The list_snapshots command also acts similarly. ---- -hbase(main):016 > tables = list(‘t.*’) +hbase(main):016 > tables = list('t.*') TABLE t 1 row(s) in 0.1040 seconds diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc b/src/main/asciidoc/_chapters/troubleshooting.adoc index eb62b338c1..8eff108441 100644 --- a/src/main/asciidoc/_chapters/troubleshooting.adoc +++ b/src/main/asciidoc/_chapters/troubleshooting.adoc @@ -941,6 +941,45 @@ java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path \... then there is a path issue with the compression libraries. See the Configuration section on link:[LZO compression configuration]. +[[trouble.rs.startup.hsync]] +==== RegionServer aborts due to lack of hsync for filesystem + +In order to provide data durability for writes to the cluster HBase relies on the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common's filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can't operate safely. + +For RegionServer roles, the failure will show up in logs like this: + +---- +2018-04-05 11:36:22,785 ERROR [regionserver/192.168.1.123:16020] wal.AsyncFSWALProvider: The RegionServer async write ahead log provider relies on the ability to call hflush and hsync for proper operation during component failures, but the current FileSystem does not support doing so. Please check the config value of 'hbase.wal.dir' and ensure it points to a FileSystem mount that has suitable capabilities for output streams. +2018-04-05 11:36:22,799 ERROR [regionserver/192.168.1.123:16020] regionserver.HRegionServer: ***** ABORTING region server 192.168.1.123,16020,1522946074234: Unhandled: cannot get log writer ***** +java.io.IOException: cannot get log writer + at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:112) + at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:612) + at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.createWriterInstance(AsyncFSWAL.java:124) + at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:759) + at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(AbstractFSWAL.java:489) + at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.(AsyncFSWAL.java:251) + at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:69) + at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:44) + at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138) + at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57) + at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:252) + at org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2105) + at org.apache.hadoop.hbase.regionserver.HRegionServer.buildServerLoad(HRegionServer.java:1326) + at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1191) + at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1007) + at java.lang.Thread.run(Thread.java:745) +Caused by: org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: hflush and hsync + at org.apache.hadoop.hbase.io.asyncfs.AsyncFSOutputHelper.createOutput(AsyncFSOutputHelper.java:69) + at org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.initOutput(AsyncProtobufLogWriter.java:168) + at org.apache.hadoop.hbase.regionserver.wal.AbstractProtobufLogWriter.init(AbstractProtobufLogWriter.java:167) + at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createAsyncWriter(AsyncFSWALProvider.java:99) + ... 15 more + +---- + +If you are attempting to run in standalone mode and see this error, please walk back through the section [[quickstart]] and ensure you have included *all* the given configuration settings. + + [[trouble.rs.runtime]] === Runtime Errors @@ -1127,6 +1166,29 @@ Sure fire solution is to just use Hadoop dfs to delete the HBase root and let HB If you have many regions on your cluster and you see an error like that reported above in this sections title in your logs, see link:https://issues.apache.org/jira/browse/HBASE-4246[HBASE-4246 Cluster with too many regions cannot withstand some master failover scenarios]. +[[trouble.master.startup.hsync]] +==== Master fails to become active due to lack of hsync for filesystem + +HBase's internal framework for cluster operations requires the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common's filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can't operate safely. + +For Master roles, the failure will show up in logs like this: + +---- +2018-04-05 11:18:44,653 ERROR [Thread-21] master.HMaster: Failed to become active master +java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it. + at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1034) + at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:374) + at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.start(ProcedureExecutor.java:530) + at org.apache.hadoop.hbase.master.HMaster.startProcedureExecutor(HMaster.java:1267) + at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:1173) + at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:881) + at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2048) + at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:568) + at java.lang.Thread.run(Thread.java:745) +---- + +If you are attempting to run in standalone mode and see this error, please walk back through the section [[quickstart]] and ensure you have included *all* the given configuration settings. + [[trouble.master.shutdown]] === Shutdown Errors -- 2.16.1