From e5f95d0257da6f77dee747361aa8f1bf71787916 Mon Sep 17 00:00:00 2001 From: Guanghao Zhang Date: Mon, 4 Mar 2019 19:55:09 +0800 Subject: [PATCH] HBASE-21972 Copy master doc into branch-2.2 and edit to make it suit 2.2.0 --- .../asciidoc/_chapters/appendix_hfile_format.adoc | 32 +- src/main/asciidoc/_chapters/architecture.adoc | 270 ++++---- src/main/asciidoc/_chapters/configuration.adoc | 89 +-- src/main/asciidoc/_chapters/cp.adoc | 9 +- src/main/asciidoc/_chapters/developer.adoc | 162 +++-- src/main/asciidoc/_chapters/mapreduce.adoc | 2 +- src/main/asciidoc/_chapters/ops_mgt.adoc | 330 ++++++++-- src/main/asciidoc/_chapters/preface.adoc | 2 +- src/main/asciidoc/_chapters/schema_design.adoc | 2 +- src/main/asciidoc/_chapters/security.adoc | 7 +- src/main/asciidoc/_chapters/spark.adoc | 699 +++++++++++++++++++++ src/main/asciidoc/_chapters/sync_replication.adoc | 125 ++++ src/main/asciidoc/_chapters/troubleshooting.adoc | 4 +- src/main/asciidoc/_chapters/upgrading.adoc | 37 +- src/main/asciidoc/book.adoc | 3 +- 15 files changed, 1479 insertions(+), 294 deletions(-) create mode 100644 src/main/asciidoc/_chapters/spark.adoc create mode 100644 src/main/asciidoc/_chapters/sync_replication.adoc diff --git a/src/main/asciidoc/_chapters/appendix_hfile_format.adoc b/src/main/asciidoc/_chapters/appendix_hfile_format.adoc index 0f37beb..98659c2 100644 --- a/src/main/asciidoc/_chapters/appendix_hfile_format.adoc +++ b/src/main/asciidoc/_chapters/appendix_hfile_format.adoc @@ -106,11 +106,11 @@ In the version 2 every block in the data section contains the following fields: .. BLOOM_CHUNK – Bloom filter chunks .. META – meta blocks (not used for Bloom filters in version 2 anymore) .. INTERMEDIATE_INDEX – intermediate-level index blocks in a multi-level blockindex -.. ROOT_INDEX – root>level index blocks in a multi>level block index -.. FILE_INFO – the ``file info'' block, a small key>value map of metadata -.. BLOOM_META – a Bloom filter metadata block in the load>on>open section -.. TRAILER – a fixed>size file trailer. - As opposed to the above, this is not an HFile v2 block but a fixed>size (for each HFile version) data structure +.. ROOT_INDEX – root-level index blocks in a multi-level block index +.. FILE_INFO – the ''file info'' block, a small key-value map of metadata +.. BLOOM_META – a Bloom filter metadata block in the load-on-open section +.. TRAILER – a fixed-size file trailer. + As opposed to the above, this is not an HFile v2 block but a fixed-size (for each HFile version) data structure .. INDEX_V1 – this block type is only used for legacy HFile v1 block . Compressed size of the block's data, not including the header (int). + @@ -127,7 +127,7 @@ The above format of blocks is used in the following HFile sections: Scanned block section:: The section is named so because it contains all data blocks that need to be read when an HFile is scanned sequentially. - Also contains leaf block index and Bloom chunk blocks. + Also contains Leaf index blocks and Bloom chunk blocks. Non-scanned block section:: This section still contains unified-format v2 blocks but it does not have to be read when doing a sequential scan. This section contains "meta" blocks and intermediate-level index blocks. @@ -140,10 +140,10 @@ There are three types of block indexes in HFile version 2, stored in two differe . Data index -- version 2 multi-level block index, consisting of: .. Version 2 root index, stored in the data block index section of the file -.. Optionally, version 2 intermediate levels, stored in the non%root format in the data index section of the file. Intermediate levels can only be present if leaf level blocks are present -.. Optionally, version 2 leaf levels, stored in the non%root format inline with data blocks +.. Optionally, version 2 intermediate levels, stored in the non-root format in the data index section of the file. Intermediate levels can only be present if leaf level blocks are present +.. Optionally, version 2 leaf levels, stored in the non-root format inline with data blocks . Meta index -- version 2 root index format only, stored in the meta index section of the file -. Bloom index -- version 2 root index format only, stored in the ``load-on-open'' section as part of Bloom filter metadata. +. Bloom index -- version 2 root index format only, stored in the ''load-on-open'' section as part of Bloom filter metadata. ==== Root block index format in version 2 @@ -156,7 +156,7 @@ A version 2 root index block is a sequence of entries of the following format, s . Offset (long) + -This offset may point to a data block or to a deeper>level index block. +This offset may point to a data block or to a deeper-level index block. . On-disk size (int) . Key (a serialized byte array stored using Bytes.writeByteArray) @@ -172,7 +172,7 @@ For the data index and the meta index the number of entries is stored in the tra For a multi-level block index we also store the following fields in the root index block in the load-on-open section of the HFile, in addition to the data structure described above: . Middle leaf index block offset -. Middle leaf block on-disk size (meaning the leaf index block containing the reference to the ``middle'' data block of the file) +. Middle leaf block on-disk size (meaning the leaf index block containing the reference to the ''middle'' data block of the file) . The index of the mid-key (defined below) in the middle leaf-level block. @@ -200,9 +200,9 @@ Every non-root index block is structured as follows. . Entries. Each entry contains: + -. Offset of the block referenced by this entry in the file (long) -. On>disk size of the referenced block (int) -. Key. +.. Offset of the block referenced by this entry in the file (long) +.. On-disk size of the referenced block (int) +.. Key. The length can be calculated from entryOffsets. @@ -214,7 +214,7 @@ In contrast with version 1, in a version 2 HFile Bloom filter metadata is stored + . Bloom filter version = 3 (int). There used to be a DynamicByteBloomFilter class that had the Bloom filter version number 2 . The total byte size of all compound Bloom filter chunks (long) -. Number of hash functions (int +. Number of hash functions (int) . Type of hash functions (int) . The total key count inserted into the Bloom filter (long) . The maximum total number of keys in the Bloom filter (long) @@ -246,7 +246,7 @@ This is because we need to know the comparator at the time of parsing the load-o ==== Fixed file trailer format differences between versions 1 and 2 The following table shows common and different fields between fixed file trailers in versions 1 and 2. -Note that the size of the trailer is different depending on the version, so it is ``fixed'' only within one version. +Note that the size of the trailer is different depending on the version, so it is ''fixed'' only within one version. However, the version is always stored as the last four-byte integer in the file. .Differences between HFile Versions 1 and 2 diff --git a/src/main/asciidoc/_chapters/architecture.adoc b/src/main/asciidoc/_chapters/architecture.adoc index 26ad0f9..c250aa8 100644 --- a/src/main/asciidoc/_chapters/architecture.adoc +++ b/src/main/asciidoc/_chapters/architecture.adoc @@ -594,6 +594,80 @@ See <> for more information on region assignment. Periodically checks and cleans up the `hbase:meta` table. See <> for more information on the meta table. +[[master.wal]] +=== MasterProcWAL + +HMaster records administrative operations and their running states, such as the handling of a crashed server, +table creation, and other DDLs, into its own WAL file. The WALs are stored under the MasterProcWALs +directory. The Master WALs are not like RegionServer WALs. Keeping up the Master WAL allows +us run a state machine that is resilient across Master failures. For example, if a HMaster was in the +middle of creating a table encounters an issue and fails, the next active HMaster can take up where +the previous left off and carry the operation to completion. Since hbase-2.0.0, a +new AssignmentManager (A.K.A AMv2) was introduced and the HMaster handles region assignment +operations, server crash processing, balancing, etc., all via AMv2 persisting all state and +transitions into MasterProcWALs rather than up into ZooKeeper, as we do in hbase-1.x. + +See <> (and <> for its basis) if you would like to learn more about the new +AssignmentManager. + +[[master.wal.conf]] +==== Configurations for MasterProcWAL +Here are the list of configurations that effect MasterProcWAL operation. +You should not have to change your defaults. + +[[hbase.procedure.store.wal.periodic.roll.msec]] +*`hbase.procedure.store.wal.periodic.roll.msec`*:: ++ +.Description +Frequency of generating a new WAL ++ +.Default +`1h (3600000 in msec)` + +[[hbase.procedure.store.wal.roll.threshold]] +*`hbase.procedure.store.wal.roll.threshold`*:: ++ +.Description +Threshold in size before the WAL rolls. Every time the WAL reaches this size or the above period, 1 hour, passes since last log roll, the HMaster will generate a new WAL. ++ +.Default +`32MB (33554432 in byte)` + +[[hbase.procedure.store.wal.warn.threshold]] +*`hbase.procedure.store.wal.warn.threshold`*:: ++ +.Description +If the number of WALs goes beyond this threshold, the following message should appear in the HMaster log with WARN level when rolling. + + procedure WALs count=xx above the warning threshold 64. check running procedures to see if something is stuck. + ++ +.Default +`64` + +[[hbase.procedure.store.wal.max.retries.before.roll]] +*`hbase.procedure.store.wal.max.retries.before.roll`*:: ++ +.Description +Max number of retry when syncing slots (records) to its underlying storage, such as HDFS. Every attempt, the following message should appear in the HMaster log. + + unable to sync slots, retry=xx + ++ +.Default +`3` + +[[hbase.procedure.store.wal.sync.failure.roll.max]] +*`hbase.procedure.store.wal.sync.failure.roll.max`*:: ++ +.Description +After the above 3 retrials, the log is rolled and the retry count is reset to 0, thereon a new set of retrial starts. This configuration controls the max number of attempts of log rolling upon sync failure. That is, HMaster is allowed to fail to sync 9 times in total. Once it exceeds, the following log should appear in the HMaster log. + + Sync slots after log roll failed, abort. ++ +.Default +`3` + [[regionserver.arch]] == RegionServer @@ -876,14 +950,14 @@ In the above, we set the BucketCache to be 4G. We configured the on-heap LruBlockCache have 20% (0.2) of the RegionServer's heap size (0.2 * 5G = 1G). In other words, you configure the L1 LruBlockCache as you would normally (as if there were no L2 cache present). link:https://issues.apache.org/jira/browse/HBASE-10641[HBASE-10641] introduced the ability to configure multiple sizes for the buckets of the BucketCache, in HBase 0.98 and newer. -To configurable multiple bucket sizes, configure the new property `hfile.block.cache.sizes` (instead of `hfile.block.cache.size`) to a comma-separated list of block sizes, ordered from smallest to largest, with no spaces. +To configurable multiple bucket sizes, configure the new property `hbase.bucketcache.bucket.sizes` to a comma-separated list of block sizes, ordered from smallest to largest, with no spaces. The goal is to optimize the bucket sizes based on your data access patterns. The following example configures buckets of size 4096 and 8192. [source,xml] ---- - hfile.block.cache.sizes + hbase.bucketcache.bucket.sizes 4096,8192 ---- @@ -947,20 +1021,20 @@ For an end-to-end off-heaped read-path, first of all there should be an off-heap _hbase-site.xml_. Also specify the total capacity of the BC using `hbase.bucketcache.size` config. Please remember to adjust value of 'HBASE_OFFHEAPSIZE' in _hbase-env.sh_. This is how we specify the max possible off-heap memory allocation for the RegionServer java process. This should be bigger than the off-heap BC size. Please keep in mind that there is no default for `hbase.bucketcache.ioengine` -which means the BC is turned OFF by default (See <>). +which means the BC is turned OFF by default (See <>). Next thing to tune is the ByteBuffer pool on the RPC server side. The buffers from this pool will be used to accumulate the cell bytes and create a result cell block to send back to the client side. `hbase.ipc.server.reservoir.enabled` can be used to turn this pool ON or OFF. By default this pool is ON and available. HBase will create off heap ByteBuffers and pool them. Please make sure not to turn this OFF if you want end-to-end off-heaping in read path. If this pool is turned off, the server will create temp buffers on heap to accumulate the cell bytes and make a result cell block. This can impact the GC on a highly read loaded server. -The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer. -Use the config `hbase.ipc.server.reservoir.initial.buffer.size` to tune each of the buffer sizes. Default is 64 KB. +The user can tune this pool with respect to how many buffers are in the pool and what should be the size of each ByteBuffer. +Use the config `hbase.ipc.server.reservoir.initial.buffer.size` to tune each of the buffer sizes. Default is 64 KB. When the read pattern is a random row read load and each of the rows are smaller in size compared to this 64 KB, try reducing this. -When the result size is larger than one ByteBuffer size, the server will try to grab more than one buffer and make a result cell block out of these. When the pool is running out of buffers, the server will end up creating temporary on-heap buffers. +When the result size is larger than one ByteBuffer size, the server will try to grab more than one buffer and make a result cell block out of these. When the pool is running out of buffers, the server will end up creating temporary on-heap buffers. -The maximum number of ByteBuffers in the pool can be tuned using the config 'hbase.ipc.server.reservoir.initial.max'. Its value defaults to 64 * region server handlers configured (See the config 'hbase.regionserver.handler.count'). The math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled. +The maximum number of ByteBuffers in the pool can be tuned using the config 'hbase.ipc.server.reservoir.initial.max'. Its value defaults to 64 * region server handlers configured (See the config 'hbase.regionserver.handler.count'). The math is such that by default we consider 2 MB as the result cell block size per read result and each handler will be handling a read. For 2 MB size, we need 32 buffers each of size 64 KB (See default buffer size in pool). So per handler 32 ByteBuffers(BB). We allocate twice this size as the max BBs count such that one handler can be creating the response and handing it to the RPC Responder thread and then handling a new request creating a new response cell block (using pooled buffers). Even if the responder could not send back the first TCP reply immediately, our count should allow that we should still have enough buffers in our pool without having to make temporary buffers on the heap. Again for smaller sized random row reads, tune this max count. There are lazily created buffers and the count is the max count to be pooled. If you still see GC issues even after making end-to-end read path off-heap, look for issues in the appropriate buffer pool. Check the below RegionServer log with INFO level: [source] @@ -968,7 +1042,7 @@ If you still see GC issues even after making end-to-end read path off-heap, look Pool already reached its max capacity : XXX and no free buffers now. Consider increasing the value for 'hbase.ipc.server.reservoir.initial.max' ? ---- -The setting for _HBASE_OFFHEAPSIZE_ in _hbase-env.sh_ should consider this off heap buffer pool at the RPC side also. We need to config this max off heap size for the RegionServer as a bit higher than the sum of this max pool size and the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra of 1 - 2 GB for the max direct memory size has worked in tests. +The setting for _HBASE_OFFHEAPSIZE_ in _hbase-env.sh_ should consider this off heap buffer pool at the RPC side also. We need to config this max off heap size for the RegionServer as a bit higher than the sum of this max pool size and the off heap cache size. The TCP layer will also need to create direct bytebuffers for TCP communication. Also the DFS client will need some off-heap to do its workings especially if short-circuit reads are configured. Allocating an extra of 1 - 2 GB for the max direct memory size has worked in tests. If you are using co processors and refer the Cells in the read results, DO NOT store reference to these Cells out of the scope of the CP hook methods. Some times the CPs need store info about the cell (Like its row key) for considering in the next CP hook call etc. For such cases, pls clone the required fields of the entire Cell as per the use cases. [ See CellUtil#cloneXXX(Cell) APIs ] @@ -1175,127 +1249,40 @@ WAL log splitting and recovery can be resource intensive and take a long time, d Distributed log processing is enabled by default since HBase 0.92. The setting is controlled by the `hbase.master.distributed.log.splitting` property, which can be set to `true` or `false`, but defaults to `true`. -[[log.splitting.step.by.step]] -.Distributed Log Splitting, Step by Step +==== WAL splitting based on procedureV2 +After HBASE-20610, we introduce a new way to do WAL splitting coordination by procedureV2 framework. This can simplify the process of WAL splitting and no need to connect zookeeper any more. -After configuring distributed log splitting, the HMaster controls the process. -The HMaster enrolls each RegionServer in the log splitting process, and the actual work of splitting the logs is done by the RegionServers. -The general process for log splitting, as described in <> still applies here. +[[background]] +.Background +Currently, splitting WAL processes are coordinated by zookeeper. Each region server are trying to grab tasks from zookeeper. And the burden becomes heavier when the number of region server increase. -. If distributed log processing is enabled, the HMaster creates a _split log manager_ instance when the cluster is started. - .. The split log manager manages all log files which need to be scanned and split. - .. The split log manager places all the logs into the ZooKeeper splitWAL node (_/hbase/splitWAL_) as tasks. - .. You can view the contents of the splitWAL by issuing the following `zkCli` command. Example output is shown. -+ -[source,bash] ----- -ls /hbase/splitWAL -[hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost8.sample.com%2C57020%2C1340474893275-splitting%2Fhost8.sample.com%253A57020.1340474893900, -hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost3.sample.com%2C57020%2C1340474893299-splitting%2Fhost3.sample.com%253A57020.1340474893931, -hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost4.sample.com%2C57020%2C1340474893287-splitting%2Fhost4.sample.com%253A57020.1340474893946] ----- -+ -The output contains some non-ASCII characters. -When decoded, it looks much more simple: -+ ----- -[hdfs://host2.sample.com:56020/hbase/WALs -/host8.sample.com,57020,1340474893275-splitting -/host8.sample.com%3A57020.1340474893900, -hdfs://host2.sample.com:56020/hbase/WALs -/host3.sample.com,57020,1340474893299-splitting -/host3.sample.com%3A57020.1340474893931, -hdfs://host2.sample.com:56020/hbase/WALs -/host4.sample.com,57020,1340474893287-splitting -/host4.sample.com%3A57020.1340474893946] ----- -+ -The listing represents WAL file names to be scanned and split, which is a list of log splitting tasks. +[[implementation.on.master.side]] +.Implementation on Master side +During ServerCrashProcedure, SplitWALManager will create one SplitWALProcedure for each WAL file which should be split. Then each SplitWALProcedure will spawn a SplitWalRemoteProcedure to send the request to region server. +SplitWALProcedure is a StateMachineProcedure and here is the state transfer diagram. -. The split log manager monitors the log-splitting tasks and workers. -+ -The split log manager is responsible for the following ongoing tasks: -+ -* Once the split log manager publishes all the tasks to the splitWAL znode, it monitors these task nodes and waits for them to be processed. -* Checks to see if there are any dead split log workers queued up. - If it finds tasks claimed by unresponsive workers, it will resubmit those tasks. - If the resubmit fails due to some ZooKeeper exception, the dead worker is queued up again for retry. -* Checks to see if there are any unassigned tasks. - If it finds any, it create an ephemeral rescan node so that each split log worker is notified to re-scan unassigned tasks via the `nodeChildrenChanged` ZooKeeper event. -* Checks for tasks which are assigned but expired. - If any are found, they are moved back to `TASK_UNASSIGNED` state again so that they can be retried. - It is possible that these tasks are assigned to slow workers, or they may already be finished. - This is not a problem, because log splitting tasks have the property of idempotence. - In other words, the same log splitting task can be processed many times without causing any problem. -* The split log manager watches the HBase split log znodes constantly. - If any split log task node data is changed, the split log manager retrieves the node data. - The node data contains the current state of the task. - You can use the `zkCli` `get` command to retrieve the current state of a task. - In the example output below, the first line of the output shows that the task is currently unassigned. -+ ----- -get /hbase/splitWAL/hdfs%3A%2F%2Fhost2.sample.com%3A56020%2Fhbase%2FWALs%2Fhost6.sample.com%2C57020%2C1340474893287-splitting%2Fhost6.sample.com%253A57020.1340474893945 +.WAL_splitting_coordination +image::WAL_splitting.png[] -unassigned host2.sample.com:57000 -cZxid = 0×7115 -ctime = Sat Jun 23 11:13:40 PDT 2012 -... ----- -+ -Based on the state of the task whose data is changed, the split log manager does one of the following: -+ -* Resubmit the task if it is unassigned -* Heartbeat the task if it is assigned -* Resubmit or fail the task if it is resigned (see <>) -* Resubmit or fail the task if it is completed with errors (see <>) -* Resubmit or fail the task if it could not complete due to errors (see <>) -* Delete the task if it is successfully completed or failed -+ -[[distributed.log.replay.failure.reasons]] -[NOTE] -.Reasons a Task Will Fail -==== -* The task has been deleted. -* The node no longer exists. -* The log status manager failed to move the state of the task to `TASK_UNASSIGNED`. -* The number of resubmits is over the resubmit threshold. -==== +[[implementation.on.region.server.side]] +.Implementation on Region Server side +Region Server will receive a SplitWALCallable and execute it, which is much more straightforward than before. It will return null if success and return exception if there is any error. -. Each RegionServer's split log worker performs the log-splitting tasks. -+ -Each RegionServer runs a daemon thread called the _split log worker_, which does the work to split the logs. -The daemon thread starts when the RegionServer starts, and registers itself to watch HBase znodes. -If any splitWAL znode children change, it notifies a sleeping worker thread to wake up and grab more tasks. -If a worker's current task's node data is changed, -the worker checks to see if the task has been taken by another worker. -If so, the worker thread stops work on the current task. -+ -The worker monitors the splitWAL znode constantly. -When a new task appears, the split log worker retrieves the task paths and checks each one until it finds an unclaimed task, which it attempts to claim. -If the claim was successful, it attempts to perform the task and updates the task's `state` property based on the splitting outcome. -At this point, the split log worker scans for another unclaimed task. -+ -.How the Split Log Worker Approaches a Task -* It queries the task state and only takes action if the task is in `TASK_UNASSIGNED `state. -* If the task is in `TASK_UNASSIGNED` state, the worker attempts to set the state to `TASK_OWNED` by itself. - If it fails to set the state, another worker will try to grab it. - The split log manager will also ask all workers to rescan later if the task remains unassigned. -* If the worker succeeds in taking ownership of the task, it tries to get the task state again to make sure it really gets it asynchronously. - In the meantime, it starts a split task executor to do the actual work: -** Get the HBase root folder, create a temp folder under the root, and split the log file to the temp folder. -** If the split was successful, the task executor sets the task to state `TASK_DONE`. -** If the worker catches an unexpected IOException, the task is set to state `TASK_ERR`. -** If the worker is shutting down, set the task to state `TASK_RESIGNED`. -** If the task is taken by another worker, just log it. - - -. The split log manager monitors for uncompleted tasks. -+ -The split log manager returns when all tasks are completed successfully. -If all tasks are completed with some failures, the split log manager throws an exception so that the log splitting can be retried. -Due to an asynchronous implementation, in very rare cases, the split log manager loses track of some completed tasks. -For that reason, it periodically checks for remaining uncompleted task in its task map or ZooKeeper. -If none are found, it throws an exception so that the log splitting can be retried right away instead of hanging there waiting for something that won't happen. +[[preformance]] +.Performance +According to tests on a cluster which has 5 regionserver and 1 master. +procedureV2 coordinated WAL splitting has a better performance than ZK coordinated WAL splitting no master when restarting the whole cluster or one region server crashing. + +[[enable.this.feature]] +.Enable this feature +To enable this feature, first we should ensure our package of HBase already contains these code. If not, please upgrade the package of HBase cluster without any configuration change first. +Then change configuration 'hbase.split.wal.zk.coordinated' to false. Rolling upgrade the master with new configuration. Now WAL splitting are handled by our new implementation. +But region server are still trying to grab tasks from zookeeper, we can rolling upgrade the region servers with the new configuration to stop that. + +* steps as follows: +** Upgrade whole cluster to get the new Implementation. +** Upgrade Master with new configuration 'hbase.split.wal.zk.coordinated'=false. +** Upgrade region server to stop grab tasks from zookeeper. [[wal.compression]] ==== WAL Compression ==== @@ -1711,6 +1698,9 @@ For example, to view the content of the file _hdfs://10.81.47.41:8020/hbase/defa If you leave off the option -v to see just a summary on the HFile. See usage for other things to do with the `hfile` tool. +NOTE: In the output of this tool, you might see 'seqid=0' for certain keys in places such as 'Mid-key'/'firstKey'/'lastKey'. These are + 'KeyOnlyKeyValue' type instances - meaning their seqid is irrelevant & we just need the keys of these Key-Value instances. + [[store.file.dir]] ===== StoreFile Directory Structure on HDFS @@ -2469,12 +2459,6 @@ The most straightforward method is to either use the `TableOutputFormat` class f The bulk load feature uses a MapReduce job to output table data in HBase's internal data format, and then directly loads the generated StoreFiles into a running cluster. Using bulk load will use less CPU and network resources than simply using the HBase API. -[[arch.bulk.load.limitations]] -=== Bulk Load Limitations - -As bulk loading bypasses the write path, the WAL doesn't get written to as part of the process. -Replication works by reading the WAL files so it won't see the bulk loaded data – and the same goes for the edits that use `Put.setDurability(SKIP_WAL)`. One way to handle that is to ship the raw files or the HFiles to the other cluster and do the other processing there. - [[arch.bulk.load.arch]] === Bulk Load Architecture @@ -2527,6 +2511,32 @@ To get started doing so, dig into `ImportTsv.java` and check the JavaDoc for HFi The import step of the bulk load can also be done programmatically. See the `LoadIncrementalHFiles` class for more information. +[[arch.bulk.load.replication]] +=== Bulk Loading Replication +HBASE-13153 adds replication support for bulk loaded HFiles, available since HBase 1.3/2.0. This feature is enabled by setting `hbase.replication.bulkload.enabled` to `true` (default is `false`). +You also need to copy the source cluster configuration files to the destination cluster. + +Additional configurations are required too: + +. `hbase.replication.source.fs.conf.provider` ++ +This defines the class which loads the source cluster file system client configuration in the destination cluster. This should be configured for all the RS in the destination cluster. Default is `org.apache.hadoop.hbase.replication.regionserver.DefaultSourceFSConfigurationProvider`. ++ +. `hbase.replication.conf.dir` ++ +This represents the base directory where the file system client configurations of the source cluster are copied to the destination cluster. This should be configured for all the RS in the destination cluster. Default is `$HBASE_CONF_DIR`. ++ +. `hbase.replication.cluster.id` ++ +This configuration is required in the cluster where replication for bulk loaded data is enabled. A source cluster is uniquely identified by the destination cluster using this id. This should be configured for all the RS in the source cluster configuration file for all the RS. ++ + + + +For example: If source cluster FS client configurations are copied to the destination cluster under directory `/home/user/dc1/`, then `hbase.replication.cluster.id` should be configured as `dc1` and `hbase.replication.conf.dir` as `/home/user`. + +NOTE: `DefaultSourceFSConfigurationProvider` supports only `xml` type files. It loads source cluster FS client configuration only once, so if source cluster FS client configuration files are updated, every peer(s) cluster RS must be restarted to reload the configuration. + [[arch.hdfs]] == HDFS diff --git a/src/main/asciidoc/_chapters/configuration.adoc b/src/main/asciidoc/_chapters/configuration.adoc index 6980a26..fd9d299 100644 --- a/src/main/asciidoc/_chapters/configuration.adoc +++ b/src/main/asciidoc/_chapters/configuration.adoc @@ -93,7 +93,10 @@ This section lists required services and some required system configuration. [[java]] .Java -The following table summarizes the recommendation of the HBase community wrt deploying on various Java versions. An entry of "yes" is meant to indicate a base level of testing and willingness to help diagnose and address issues you might run into. Similarly, an entry of "no" or "Not Supported" generally means that should you run into an issue the community is likely to ask you to change the Java environment before proceeding to help. In some cases, specific guidance on limitations (e.g. wether compiling / unit tests work, specific operational issues, etc) will also be noted. +The following table summarizes the recommendation of the HBase community wrt deploying on various Java versions. +A icon:check-circle[role="green"] symbol is meant to indicate a base level of testing and willingness to help diagnose and address issues you might run into. +Similarly, an entry of icon:exclamation-circle[role="yellow"] or icon:times-circle[role="red"] generally means that should you run into an issue the community is likely to ask you to change the Java environment before proceeding to help. +In some cases, specific guidance on limitations (e.g. whether compiling / unit tests work, specific operational issues, etc) will also be noted. .Long Term Support JDKs are recommended [TIP] @@ -102,32 +105,34 @@ HBase recommends downstream users rely on JDK releases that are marked as Long T ==== .Java support by release line -[cols="1,1,1,1,1", options="header"] +[cols="6*^.^", options="header"] |=== |HBase Version |JDK 7 |JDK 8 -|JDK 9 -|JDK 10 - -|2.0 -|link:http://search-hadoop.com/m/YGbbsPxZ723m3as[Not Supported] -|yes -|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported] -|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported] - -|1.3 -|yes -|yes -|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported] -|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported] - - -|1.2 -|yes -|yes -|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported] -|link:https://issues.apache.org/jira/browse/HBASE-20264[Not Supported] +|JDK 9 (Non-LTS) +|JDK 10 (Non-LTS) +|JDK 11 + +|2.0+ +|icon:times-circle[role="red"] +|icon:check-circle[role="green"] +v|icon:exclamation-circle[role="yellow"] +link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264] +v|icon:exclamation-circle[role="yellow"] +link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264] +v|icon:exclamation-circle[role="yellow"] +link:https://issues.apache.org/jira/browse/HBASE-21110[HBASE-21110] + +|1.2+ +|icon:check-circle[role="green"] +|icon:check-circle[role="green"] +v|icon:exclamation-circle[role="yellow"] +link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264] +v|icon:exclamation-circle[role="yellow"] +link:https://issues.apache.org/jira/browse/HBASE-20264[HBASE-20264] +v|icon:exclamation-circle[role="yellow"] +link:https://issues.apache.org/jira/browse/HBASE-21110[HBASE-21110] |=== @@ -213,26 +218,28 @@ Use the following legend to interpret this table: .Hadoop version support matrix -* "S" = supported -* "X" = not supported -* "NT" = Not tested +* icon:check-circle[role="green"] = Tested to be fully-functional +* icon:times-circle[role="red"] = Known to not be fully-functional +* icon:exclamation-circle[role="yellow"] = Not tested, may/may-not function -[cols="1,1,1,1,1,1", options="header"] +[cols="1,4*^.^", options="header"] |=== -| | HBase-1.2.x | HBase-1.3.x | HBase-1.5.x | HBase-2.0.x | HBase-2.1.x -|Hadoop-2.4.x | S | S | X | X | X -|Hadoop-2.5.x | S | S | X | X | X -|Hadoop-2.6.0 | X | X | X | X | X -|Hadoop-2.6.1+ | S | S | X | S | X -|Hadoop-2.7.0 | X | X | X | X | X -|Hadoop-2.7.1+ | S | S | S | S | S -|Hadoop-2.8.[0-1] | X | X | X | X | X -|Hadoop-2.8.2 | NT | NT | NT | NT | NT -|Hadoop-2.8.3+ | NT | NT | NT | S | S -|Hadoop-2.9.0 | X | X | X | X | X -|Hadoop-2.9.1+ | NT | NT | NT | NT | NT -|Hadoop-3.0.x | X | X | X | X | X -|Hadoop-3.1.0 | X | X | X | X | X +| | HBase-1.2.x, HBase-1.3.x | HBase-1.4.x | HBase-2.0.x | HBase-2.1.x +|Hadoop-2.4.x | icon:check-circle[role="green"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-2.5.x | icon:check-circle[role="green"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-2.6.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-2.6.1+ | icon:check-circle[role="green"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:times-circle[role="red"] +|Hadoop-2.7.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-2.7.1+ | icon:check-circle[role="green"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] +|Hadoop-2.8.[0-1] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-2.8.2 | icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] +|Hadoop-2.8.3+ | icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] +|Hadoop-2.9.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-2.9.1+ | icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] | icon:exclamation-circle[role="yellow"] +|Hadoop-3.0.[0-2] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-3.0.3+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] +|Hadoop-3.1.0 | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:times-circle[role="red"] +|Hadoop-3.1.1+ | icon:times-circle[role="red"] | icon:times-circle[role="red"] | icon:check-circle[role="green"] | icon:check-circle[role="green"] |=== .Hadoop Pre-2.6.1 and JDK 1.8 Kerberos diff --git a/src/main/asciidoc/_chapters/cp.adoc b/src/main/asciidoc/_chapters/cp.adoc index abe334c..5fd80b4 100644 --- a/src/main/asciidoc/_chapters/cp.adoc +++ b/src/main/asciidoc/_chapters/cp.adoc @@ -483,6 +483,7 @@ The following Observer coprocessor prevents the details of the user `admin` from returned in a `Get` or `Scan` of the `users` table. . Write a class that implements the +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionCoprocessor.html[RegionCoprocessor], link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver] class. @@ -500,10 +501,9 @@ empty result. Otherwise, process the request as normal. Following are the implementation of the above steps: - [source,java] ---- -public class RegionObserverExample implements RegionObserver { +public class RegionObserverExample implements RegionCoprocessor, RegionObserver { private static final byte[] ADMIN = Bytes.toBytes("admin"); private static final byte[] COLUMN_FAMILY = Bytes.toBytes("details"); @@ -511,6 +511,11 @@ public class RegionObserverExample implements RegionObserver { private static final byte[] VALUE = Bytes.toBytes("You can't see Admin details"); @Override + public Optional getRegionObserver() { + return Optional.of(this); + } + + @Override public void preGetOp(final ObserverContext e, final Get get, final List results) throws IOException { diff --git a/src/main/asciidoc/_chapters/developer.adoc b/src/main/asciidoc/_chapters/developer.adoc index 32432ff..f728236 100644 --- a/src/main/asciidoc/_chapters/developer.adoc +++ b/src/main/asciidoc/_chapters/developer.adoc @@ -2039,30 +2039,97 @@ For more information on how to use ReviewBoard, see link:http://www.reviewboard. ==== Guide for HBase Committers +===== Becoming a committer + +Committers are responsible for reviewing and integrating code changes, testing +and voting on release candidates, weighing in on design discussions, as well as +other types of project contributions. The PMC votes to make a contributor a +committer based on an assessment of their contributions to the project. It is +expected that committers demonstrate a sustained history of high-quality +contributions to the project and community involvement. + +Contributions can be made in many ways. There is no single path to becoming a +committer, nor any expected timeline. Submitting features, improvements, and bug +fixes is the most common avenue, but other methods are both recognized and +encouraged (and may be even more important to the health of HBase as a project and a +community). A non-exhaustive list of potential contributions (in no particular +order): + +* <> for new + changes, best practices, recipes, and other improvements. +* Keep the website up to date. +* Perform testing and report the results. For instance, scale testing and + testing non-standard configurations is always appreciated. +* Maintain the shared Jenkins testing environment and other testing + infrastructure. +* <> after performing validation, even if non-binding. + A non-binding vote is a vote by a non-committer. +* Provide input for discussion threads on the link:/mail-lists.html[mailing lists] (which usually have + `[DISCUSS]` in the subject line). +* Answer questions questions on the user or developer mailing lists and on + Slack. +* Make sure the HBase community is a welcoming one and that we adhere to our + link:/coc.html[Code of conduct]. Alert the PMC if you + have concerns. +* Review other people's work (both code and non-code) and provide public + feedback. +* Report bugs that are found, or file new feature requests. +* Triage issues and keep JIRA organized. This includes closing stale issues, + labeling new issues, updating metadata, and other tasks as needed. +* Mentor new contributors of all sorts. +* Give talks and write blogs about HBase. Add these to the link:/[News] section + of the website. +* Provide UX feedback about HBase, the web UI, the CLI, APIs, and the website. +* Write demo applications and scripts. +* Help attract and retain a diverse community. +* Interact with other projects in ways that benefit HBase and those other + projects. + +Not every individual is able to do all (or even any) of the items on this list. +If you think of other ways to contribute, go for it (and add them to the list). +A pleasant demeanor and willingness to contribute are all you need to make a +positive impact on the HBase project. Invitations to become a committer are the +result of steady interaction with the community over the long term, which builds +trust and recognition. + ===== New committers -New committers are encouraged to first read Apache's generic committer documentation: +New committers are encouraged to first read Apache's generic committer +documentation: * link:https://www.apache.org/dev/new-committers-guide.html[Apache New Committer Guide] * link:https://www.apache.org/dev/committers.html[Apache Committer FAQ] ===== Review -HBase committers should, as often as possible, attempt to review patches submitted by others. -Ideally every submitted patch will get reviewed by a committer _within a few days_. -If a committer reviews a patch they have not authored, and believe it to be of sufficient quality, then they can commit the patch, otherwise the patch should be cancelled with a clear explanation for why it was rejected. - -The list of submitted patches is in the link:https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12312392[HBase Review Queue], which is ordered by time of last modification. -Committers should scan the list from top to bottom, looking for patches that they feel qualified to review and possibly commit. - -For non-trivial changes, it is required to get another committer to review your own patches before commit. -Use the btn:[Submit Patch] button in JIRA, just like other contributors, and then wait for a `+1` response from another committer before committing. +HBase committers should, as often as possible, attempt to review patches +submitted by others. Ideally every submitted patch will get reviewed by a +committer _within a few days_. If a committer reviews a patch they have not +authored, and believe it to be of sufficient quality, then they can commit the +patch. Otherwise the patch should be cancelled with a clear explanation for why +it was rejected. + +The list of submitted patches is in the +link:https://issues.apache.org/jira/secure/IssueNavigator.jspa?mode=hide&requestId=12312392[HBase Review Queue], +which is ordered by time of last modification. Committers should scan the list +from top to bottom, looking for patches that they feel qualified to review and +possibly commit. If you see a patch you think someone else is better qualified +to review, you can mention them by username in the JIRA. + +For non-trivial changes, it is required that another committer review your +patches before commit. **Self-commits of non-trivial patches are not allowed.** +Use the btn:[Submit Patch] button in JIRA, just like other contributors, and +then wait for a `+1` response from another committer before committing. ===== Reject -Patches which do not adhere to the guidelines in link:https://hbase.apache.org/book.html#developer[HowToContribute] and to the link:https://wiki.apache.org/hadoop/CodeReviewChecklist[code review checklist] should be rejected. -Committers should always be polite to contributors and try to instruct and encourage them to contribute better patches. -If a committer wishes to improve an unacceptable patch, then it should first be rejected, and a new patch should be attached by the committer for review. +Patches which do not adhere to the guidelines in +link:https://hbase.apache.org/book.html#developer[HowToContribute] and to the +link:https://wiki.apache.org/hadoop/CodeReviewChecklist[code review checklist] +should be rejected. Committers should always be polite to contributors and try +to instruct and encourage them to contribute better patches. If a committer +wishes to improve an unacceptable patch, then it should first be rejected, and a +new patch should be attached by the committer for further review. [[committing.patches]] ===== Commit @@ -2073,29 +2140,34 @@ Committers commit patches to the Apache HBase GIT repository. [NOTE] ==== Make sure your local configuration is correct, especially your identity and email. -Examine the output of the +$ git config - --list+ command and be sure it is correct. -See this GitHub article, link:https://help.github.com/articles/set-up-git[Set Up Git] if you need pointers. +Examine the output of the +$ git config --list+ command and be sure it is correct. +See link:https://help.github.com/articles/set-up-git[Set Up Git] if you need +pointers. ==== -When you commit a patch, please: - -. Include the Jira issue id in the commit message along with a short description of the change. Try - to add something more than just the Jira title so that someone looking at git log doesn't - have to go to Jira to discern what the change is about. - Be sure to get the issue ID right, as this causes Jira to link to the change in Git (use the - issue's "All" tab to see these). -. Commit the patch to a new branch based off master or other intended branch. - It's a good idea to call this branch by the JIRA ID. - Then check out the relevant target branch where you want to commit, make sure your local branch has all remote changes, by doing a +git pull --rebase+ or another similar command, cherry-pick the change into each relevant branch (such as master), and do +git push - +. +When you commit a patch: + +. Include the Jira issue ID in the commit message along with a short description + of the change. Try to add something more than just the Jira title so that + someone looking at `git log` output doesn't have to go to Jira to discern what + the change is about. Be sure to get the issue ID right, because this causes + Jira to link to the change in Git (use the issue's "All" tab to see these + automatic links). +. Commit the patch to a new branch based off `master` or the other intended + branch. It's a good idea to include the JIRA ID in the name of this branch. + Check out the relevant target branch where you want to commit, and make sure + your local branch has all remote changes, by doing a +git pull --rebase+ or + another similar command. Next, cherry-pick the change into each relevant + branch (such as master), and push the changes to the remote branch using + a command such as +git push +. + WARNING: If you do not have all remote changes, the push will fail. If the push fails for any reason, fix the problem or ask for help. Do not do a +git push --force+. + Before you can commit a patch, you need to determine how the patch was created. -The instructions and preferences around the way to create patches have changed, and there will be a transition period. +The instructions and preferences around the way to create patches have changed, +and there will be a transition period. + .Determine How a Patch Was Created * If the first few lines of the patch look like the headers of an email, with a From, Date, and @@ -2122,16 +2194,18 @@ diff --git src/main/asciidoc/_chapters/developer.adoc src/main/asciidoc/_chapter + .Example of committing a Patch ==== -One thing you will notice with these examples is that there are a lot of +git pull+ commands. -The only command that actually writes anything to the remote repository is +git push+, and you need to make absolutely sure you have the correct versions of everything and don't have any conflicts before pushing. -The extra +git - pull+ commands are usually redundant, but better safe than sorry. +One thing you will notice with these examples is that there are a lot of ++git pull+ commands. The only command that actually writes anything to the +remote repository is +git push+, and you need to make absolutely sure you have +the correct versions of everything and don't have any conflicts before pushing. +The extra +git pull+ commands are usually redundant, but better safe than sorry. -The first example shows how to apply a patch that was generated with +git format-patch+ and apply it to the `master` and `branch-1` branches. +The first example shows how to apply a patch that was generated with +git +format-patch+ and apply it to the `master` and `branch-1` branches. -The directive to use +git format-patch+ rather than +git diff+, and not to use `--no-prefix`, is a new one. -See the second example for how to apply a patch created with +git - diff+, and educate the person who created the patch. +The directive to use +git format-patch+ rather than +git diff+, and not to use +`--no-prefix`, is a new one. See the second example for how to apply a patch +created with +git diff+, and educate the person who created the patch. ---- $ git checkout -b HBASE-XXXX @@ -2153,13 +2227,13 @@ $ git push origin branch-1 $ git branch -D HBASE-XXXX ---- -This example shows how to commit a patch that was created using +git diff+ without `--no-prefix`. -If the patch was created with `--no-prefix`, add `-p0` to the +git apply+ command. +This example shows how to commit a patch that was created using +git diff+ +without `--no-prefix`. If the patch was created with `--no-prefix`, add `-p0` to +the +git apply+ command. ---- $ git apply ~/Downloads/HBASE-XXXX-v2.patch -$ git commit -m "HBASE-XXXX Really Good Code Fix (Joe Schmo)" --author= -a # This -and next command is needed for patches created with 'git diff' +$ git commit -m "HBASE-XXXX Really Good Code Fix (Joe Schmo)" --author= -a # This and next command is needed for patches created with 'git diff' $ git commit --amend --signoff $ git checkout master $ git pull --rebase @@ -2180,7 +2254,9 @@ $ git branch -D HBASE-XXXX ==== . Resolve the issue as fixed, thanking the contributor. - Always set the "Fix Version" at this point, but please only set a single fix version for each branch where the change was committed, the earliest release in that branch in which the change will appear. + Always set the "Fix Version" at this point, but only set a single fix version + for each branch where the change was committed, the earliest release in that + branch in which the change will appear. ====== Commit Message Format @@ -2195,7 +2271,9 @@ The preferred commit message format is: HBASE-12345 Fix All The Things (jane@example.com) ---- -If the contributor used +git format-patch+ to generate the patch, their commit message is in their patch and you can use that, but be sure the JIRA ID is at the front of the commit message, even if the contributor left it out. +If the contributor used +git format-patch+ to generate the patch, their commit +message is in their patch and you can use that, but be sure the JIRA ID is at +the front of the commit message, even if the contributor left it out. [[committer.amending.author]] ====== Add Amending-Author when a conflict cherrypick backporting diff --git a/src/main/asciidoc/_chapters/mapreduce.adoc b/src/main/asciidoc/_chapters/mapreduce.adoc index 2f72a2d..61cff86 100644 --- a/src/main/asciidoc/_chapters/mapreduce.adoc +++ b/src/main/asciidoc/_chapters/mapreduce.adoc @@ -120,7 +120,7 @@ You might find the more selective `hbase mapredcp` tool output of interest; it l to run a basic mapreduce job against an hbase install. It does not include configuration. You'll probably need to add these if you want your MapReduce job to find the target cluster. You'll probably have to also add pointers to extra jars once you start to do anything of substance. Just specify the extras by passing the system propery `-Dtmpjars` when -you run `hbase mapredcp`. +you run `hbase mapredcp`. For jobs that do not package their dependencies or call `TableMapReduceUtil#addDependencyJars`, the following command structure is necessary: diff --git a/src/main/asciidoc/_chapters/ops_mgt.adoc b/src/main/asciidoc/_chapters/ops_mgt.adoc index 4da4a67..ee7bd97 100644 --- a/src/main/asciidoc/_chapters/ops_mgt.adoc +++ b/src/main/asciidoc/_chapters/ops_mgt.adoc @@ -51,7 +51,8 @@ Options: Commands: Some commands take arguments. Pass no args or -h for usage. shell Run the HBase shell - hbck Run the hbase 'fsck' tool + hbck Run the HBase 'fsck' tool. Defaults read-only hbck1. + Pass '-j /path/to/HBCK2.jar' to run hbase-2.x HBCK2. snapshot Tool for managing snapshots wal Write-ahead-log analyzer hfile Store file analyzer @@ -386,12 +387,33 @@ Each command except `RowCounter` and `CellCounter` accept a single `--help` argu [[hbck]] === HBase `hbck` -To run `hbck` against your HBase cluster run `$./bin/hbase hbck`. At the end of the command's output it prints `OK` or `INCONSISTENCY`. -If your cluster reports inconsistencies, pass `-details` to see more detail emitted. -If inconsistencies, run `hbck` a few times because the inconsistency may be transient (e.g. cluster is starting up or a region is splitting). - Passing `-fix` may correct the inconsistency (This is an experimental feature). +The `hbck` tool that shipped with hbase-1.x has been made read-only in hbase-2.x. It is not able to repair +hbase-2.x clusters as hbase internals have changed. Nor should its assessments in read-only mode be +trusted as it does not understand hbase-2.x operation. -For more information, see <>. +A new tool, <>, described in the next section, replaces `hbck`. + +[[HBCK2]] +=== HBase `HBCK2` + +`HBCK2` is the successor to <>, the hbase-1.x fix tool (A.K.A `hbck1`). Use it in place of `hbck1` +making repairs against hbase-2.x installs. + +`HBCK2` does not ship as part of hbase. It can be found as a subproject of the companion +link:https://github.com/apache/hbase-operator-tools[hbase-operator-tools] repository at +link:https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2[Apache HBase HBCK2 Tool]. +`HBCK2` was moved out of hbase so it could evolve at a cadence apart from that of hbase core. + +See the [https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2](HBCK2) Home Page +for how `HBCK2` differs from `hbck1`, and for how to build and use it. + +Once built, you can run `HBCK2` as follows: + +``` +$ hbase hbck -j /path/to/HBCK2.jar +``` + +This will generate `HBCK2` usage describing commands and options. [[hfile_tool2]] === HFile Tool @@ -509,6 +531,124 @@ By default, CopyTable utility only copies the latest version of row cells unless See Jonathan Hsieh's link:https://blog.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/[Online HBase Backups with CopyTable] blog post for more on `CopyTable`. +[[hashtable.synctable]] +=== HashTable/SyncTable + +HashTable/SyncTable is a two steps tool for synchronizing table data, where each of the steps are implemented as MapReduce jobs. +Similarly to CopyTable, it can be used for partial or entire table data syncing, under same or remote cluster. +However, it performs the sync in a more efficient way than CopyTable. Instead of copying all cells +in specified row key/time period range, HashTable (the first step) creates hashed indexes for batch of cells on source table and output those as results. +On the next stage, SyncTable scans the source table and now calculates hash indexes for table cells, +compares these hashes with the outputs of HashTable, then it just scans (and compares) cells for diverging hashes, only updating +mismatching cells. This results in less network traffic/data transfers, which can be impacting when syncing large tables on remote clusters. + +==== Step 1, HashTable + +First, run HashTable on the source table cluster (this is the table whose state will be copied to its counterpart). + +Usage: + +---- +$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.HashTable --help +Usage: HashTable [options] + +Options: + batchsize the target amount of bytes to hash in each batch + rows are added to the batch until this size is reached + (defaults to 8000 bytes) + numhashfiles the number of hash files to create + if set to fewer than number of regions then + the job will create this number of reducers + (defaults to 1/100 of regions -- at least 1) + startrow the start row + stoprow the stop row + starttime beginning of the time range (unixtime in millis) + without endtime means from starttime to forever + endtime end of the time range. Ignored if no starttime specified. + scanbatch scanner batch size to support intra row scans + versions number of cell versions to include + families comma-separated list of families to include + +Args: + tablename Name of the table to hash + outputpath Filesystem path to put the output data + +Examples: + To hash 'TestTable' in 32kB batches for a 1 hour window into 50 files: + $ bin/hbase org.apache.hadoop.hbase.mapreduce.HashTable --batchsize=32000 --numhashfiles=50 --starttime=1265875194289 --endtime=1265878794289 --families=cf2,cf3 TestTable /hashes/testTable +---- + +The *batchsize* property defines how much cell data for a given region will be hashed together in a single hash value. +Sizing this properly has a direct impact on the sync efficiency, as it may lead to less scans executed by mapper tasks +of SyncTable (the next step in the process). The rule of thumb is that, the smaller the number of cells out of sync +(lower probability of finding a diff), larger batch size values can be determined. + +==== Step 2, SyncTable + +Once HashTable has completed on source cluster, SyncTable can be ran on target cluster. +Just like replication and other synchronization jobs, it requires that all RegionServers/DataNodes +on source cluster be accessible by NodeManagers on the target cluster (where SyncTable job tasks will be running). + +Usage: + +---- +$ ./bin/hbase org.apache.hadoop.hbase.mapreduce.SyncTable --help +Usage: SyncTable [options] + +Options: + sourcezkcluster ZK cluster key of the source table + (defaults to cluster in classpath's config) + targetzkcluster ZK cluster key of the target table + (defaults to cluster in classpath's config) + dryrun if true, output counters but no writes + (defaults to false) + doDeletes if false, does not perform deletes + (defaults to true) + doPuts if false, does not perform puts + (defaults to true) + +Args: + sourcehashdir path to HashTable output dir for source table + (see org.apache.hadoop.hbase.mapreduce.HashTable) + sourcetable Name of the source table to sync from + targettable Name of the target table to sync to + +Examples: + For a dry run SyncTable of tableA from a remote source cluster + to a local target cluster: + $ bin/hbase org.apache.hadoop.hbase.mapreduce.SyncTable --dryrun=true --sourcezkcluster=zk1.example.com,zk2.example.com,zk3.example.com:2181:/hbase hdfs://nn:9000/hashes/tableA tableA tableA +---- + +The *dryrun* option is useful when a read only, diff report is wanted, as it will produce only COUNTERS indicating the differences, but will not perform +any actual changes. It can be used as an alternative to VerifyReplication tool. + +By default, SyncTable will cause target table to become an exact copy of source table (at least, for the specified startrow/stoprow or/and starttime/endtime). + +Setting doDeletes to false modifies default behaviour to not delete target cells that are missing on source. +Similarly, setting doPuts to false modifies default behaviour to not add missing cells on target. Setting both doDeletes +and doPuts to false would give same effect as setting dryrun to true. + +.Set doDeletes to false on Two-Way Replication scenarios +[NOTE] +==== +On Two-Way Replication or other scenarios where both source and target clusters can have data ingested, it's advisable to always set doDeletes option to false, +as any additional cell inserted on SyncTable target cluster and not yet replicated to source would be deleted, and potentially lost permanently. +==== + +.Set sourcezkcluster to the actual source cluster ZK quorum +[NOTE] +==== +Although not required, if sourcezkcluster is not set, SyncTable will connect to local HBase cluster for both source and target, +which does not give any meaningful result. +==== + +.Remote Clusters on different Kerberos Realms +[NOTE] +==== +Currently, SyncTable can't be ran for remote clusters on different Kerberos realms. +There's some work in progress to resolve this on link:https://jira.apache.org/jira/browse/HBASE-20586[HBASE-20586] +==== + [[export]] === Export @@ -819,15 +959,85 @@ See link:https://issues.apache.org/jira/browse/HBASE-4391[HBASE-4391 Add ability [[compaction.tool]] === Offline Compaction Tool -See the usage for the -link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool]. -Run it like: +*CompactionTool* provides a way of running compactions (either minor or major) as an independent +process from the RegionServer. It reuses same internal implementation classes executed by RegionServer +compaction feature. However, since this runs on a complete separate independent java process, it +releases RegionServers from the overhead involved in rewrite a set of hfiles, which can be critical +for latency sensitive use cases. -[source, bash] +Usage: ---- $ ./bin/hbase org.apache.hadoop.hbase.regionserver.CompactionTool + +Usage: java org.apache.hadoop.hbase.regionserver.CompactionTool \ + [-compactOnce] [-major] [-mapred] [-D]* files... + +Options: + mapred Use MapReduce to run compaction. + compactOnce Execute just one compaction step. (default: while needed) + major Trigger major compaction. + +Note: -D properties will be applied to the conf used. +For example: + To stop delete of compacted file, pass -Dhbase.compactiontool.delete=false + To set tmp dir, pass -Dhbase.tmp.dir=ALTERNATE_DIR + +Examples: + To compact the full 'TestTable' using MapReduce: + $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool -mapred hdfs://hbase/data/default/TestTable + + To compact column family 'x' of the table 'TestTable' region 'abc': + $ hbase org.apache.hadoop.hbase.regionserver.CompactionTool hdfs://hbase/data/default/TestTable/abc/x ---- +As shown by usage options above, *CompactionTool* can run as a standalone client or a mapreduce job. +When running as mapreduce job, each family dir is handled as an input split, and is processed +by a separate map task. + +The *compactionOnce* parameter controls how many compaction cycles will be performed until +*CompactionTool* program decides to finish its work. If omitted, it will assume it should keep +running compactions on each specified family as determined by the given compaction policy +configured. For more info on compaction policy, see <>. + +If a major compaction is desired, *major* flag can be specified. If omitted, *CompactionTool* will +assume minor compaction is wanted by default. + +It also allows for configuration overrides with `-D` flag. In the usage section above, for example, +`-Dhbase.compactiontool.delete=false` option will instruct compaction engine to not delete original +files from temp folder. + +Files targeted for compaction must be specified as parent hdfs dirs. It allows for multiple dirs +definition, as long as each for these dirs are either a *family*, a *region*, or a *table* dir. If a +table or region dir is passed, the program will recursively iterate through related sub-folders, +effectively running compaction for each family found below the table/region level. + +Since these dirs are nested under *hbase* hdfs directory tree, *CompactionTool* requires hbase super +user permissions in order to have access to required hfiles. + +.Running in MapReduce mode +[NOTE] +==== +MapReduce mode offers the ability to process each family dir in parallel, as a separate map task. +Generally, it would make sense to run in this mode when specifying one or more table dirs as targets +for compactions. The caveat, though, is that if number of families to be compacted become too large, +the related mapreduce job may have indirect impacts on *RegionServers* performance . +Since *NodeManagers* are normally co-located with RegionServers, such large jobs could +compete for IO/Bandwidth resources with the *RegionServers*. +==== + +.MajorCompaction completely disabled on RegionServers due performance impacts +[NOTE] +==== +*Major compactions* can be a costly operation (see <>), and can indeed +impact performance on RegionServers, leading operators to completely disable it for critical +low latency application. *CompactionTool* could be used as an alternative in such scenarios, +although, additional custom application logic would need to be implemented, such as deciding +scheduling and selection of tables/regions/families target for a given compaction run. +==== + +For additional details about CompactionTool, see also +link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/regionserver/CompactionTool.html[CompactionTool]. + === `hbase clean` The `hbase clean` command cleans HBase data from ZooKeeper, HDFS, or both. @@ -1272,15 +1482,6 @@ Monitor the output of the _/tmp/log.txt_ file to follow the progress of the scri Use the following guidelines if you want to create your own rolling restart script. . Extract the new release, verify its configuration, and synchronize it to all nodes of your cluster using `rsync`, `scp`, or another secure synchronization mechanism. -. Use the hbck utility to ensure that the cluster is consistent. -+ ----- - -$ ./bin/hbck ----- -+ -Perform repairs if required. -See <> for details. . Restart the master first. You may need to modify these commands if your new HBase directory is different from the old one, such as for an upgrade. @@ -1312,7 +1513,6 @@ $ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart -- ---- . Restart the Master again, to clear out the dead servers list and re-enable the load balancer. -. Run the `hbck` utility again, to be sure the cluster is consistent. [[adding.new.node]] === Adding a New Node @@ -1612,6 +1812,28 @@ image::bc_l1.png[] This is not an exhaustive list of all the screens and reports available. Have a look in the Web UI. +=== Snapshot Space Usage Monitoring + +Starting with HBase 0.95, Snapshot usage information on individual snapshots was shown in the HBase Master Web UI. This was further enhanced starting with HBase 1.3 to show the total Storefile size of the Snapshot Set. The following metrics are shown in the Master Web UI with HBase 1.3 and later. + +* Shared Storefile Size is the Storefile size shared between snapshots and active tables. +* Mob Storefile Size is the Mob Storefile size shared between snapshots and active tables. +* Archived Storefile Size is the Storefile size in Archive. + +The format of Archived Storefile Size is NNN(MMM). NNN is the total Storefile size in Archive, MMM is the total Storefile size in Archive that is specific to the snapshot (not shared with other snapshots and tables). + +.Master Snapshot Overview +image::master-snapshot.png[] + +.Snapshot Storefile Stats Example 1 +image::1-snapshot.png[] + +.Snapshot Storefile Stats Example 2 +image::2-snapshots.png[] + +.Empty Snapshot Storfile Stats Example +image::empty-snapshots.png[] + == Cluster Replication NOTE: This information was previously available at @@ -1628,6 +1850,9 @@ Some use cases for cluster replication include: NOTE: Replication is enabled at the granularity of the column family. Before enabling replication for a column family, create the table and all column families to be replicated, on the destination cluster. +NOTE: Replication is asynchronous as we send WAL to another cluster in background, which means that when you want to do recovery through replication, you could loss some data. To address this problem, we have introduced a new feature called synchronous replication. As the mechanism is a bit different so we use a separated section to describe it. Please see +<>. + === Replication Overview Cluster replication uses a source-push methodology. @@ -2454,9 +2679,12 @@ Since the cluster is up, there is a risk that edits could be missed in the expor [[ops.snapshots]] == HBase Snapshots -HBase Snapshots allow you to take a snapshot of a table without too much impact on Region Servers. -Snapshot, Clone and restore operations don't involve data copying. -Also, Exporting the snapshot to another cluster doesn't have impact on the Region Servers. +HBase Snapshots allow you to take a copy of a table (both contents and metadata)with a very small performance impact. A Snapshot is an immutable +collection of table metadata and a list of HFiles that comprised the table at the time the Snapshot was taken. A "clone" +of a snapshot creates a new table from that snapshot, and a "restore" of a snapshot returns the contents of a table to +what it was when the snapshot was created. The "clone" and "restore" operations do not require any data to be copied, +as the underlying HFiles (the files which contain the data for an HBase table) are not modified with either action. +Simiarly, exporting a snapshot to another cluster has little impact on RegionServers of the local cluster. Prior to version 0.94.6, the only way to backup or to clone a table is to use CopyTable/ExportTable, or to copy all the hfiles in HDFS after disabling the table. The disadvantages of these methods are that you can degrade region server performance (Copy/Export Table) or you need to disable the table, that means no reads or writes; and this is usually unacceptable. @@ -2719,8 +2947,6 @@ HDFS replication factor only affects your disk usage and is invisible to most HB You can view the current number of regions for a given table using the HMaster UI. In the [label]#Tables# section, the number of online regions for each table is listed in the [label]#Online Regions# column. This total only includes the in-memory state and does not include disabled or offline regions. -If you do not want to use the HMaster UI, you can determine the number of regions by counting the number of subdirectories of the /hbase// subdirectories in HDFS, or by running the `bin/hbase hbck` command. -Each of these methods may return a slightly different number, depending on the status of each region. [[ops.capacity.regions.count]] ==== Number of regions per RS - upper bound @@ -3007,8 +3233,8 @@ If it appears stuck, restart the Master process. === Remove RegionServer Grouping Removing RegionServer Grouping feature from a cluster on which it was enabled involves -more steps in addition to removing the relevant properties from `hbase-site.xml`. This is -to clean the RegionServer grouping related meta data so that if the feature is re-enabled +more steps in addition to removing the relevant properties from `hbase-site.xml`. This is +to clean the RegionServer grouping related meta data so that if the feature is re-enabled in the future, the old meta data will not affect the functioning of the cluster. - Move all tables in non-default rsgroups to `default` regionserver group @@ -3017,7 +3243,7 @@ in the future, the old meta data will not affect the functioning of the cluster. #Reassigning table t1 from non default group - hbase shell hbase(main):005:0> move_tables_rsgroup 'default',['t1'] ---- -- Move all regionservers in non-default rsgroups to `default` regionserver group +- Move all regionservers in non-default rsgroups to `default` regionserver group [source, bash] ---- #Reassigning all the servers in the non-default rsgroup to default - hbase shell @@ -3092,21 +3318,21 @@ To check normalizer status and enable/disable normalizer [source,bash] ---- hbase(main):001:0> normalizer_enabled -true +true 0 row(s) in 0.4870 seconds - + hbase(main):002:0> normalizer_switch false -true +true 0 row(s) in 0.0640 seconds - + hbase(main):003:0> normalizer_enabled -false +false 0 row(s) in 0.0120 seconds - + hbase(main):004:0> normalizer_switch true false 0 row(s) in 0.0200 seconds - + hbase(main):005:0> normalizer_enabled true 0 row(s) in 0.0090 seconds @@ -3125,19 +3351,19 @@ merge action being taken as a result of the normalization plan computed by Simpl Consider an user table with some pre-split regions having 3 equally large regions (about 100K rows) and 1 relatively small region (about 25K rows). Following is the -snippet from an hbase meta table scan showing each of the pre-split regions for +snippet from an hbase meta table scan showing each of the pre-split regions for the user table. ---- -table_p8ddpd6q5z,,1469494305548.68b9892220865cb6048 column=info:regioninfo, timestamp=1469494306375, value={ENCODED => 68b9892220865cb604809c950d1adf48, NAME => 'table_p8ddpd6q5z,,1469494305548.68b989222 09c950d1adf48. 0865cb604809c950d1adf48.', STARTKEY => '', ENDKEY => '1'} -.... -table_p8ddpd6q5z,1,1469494317178.867b77333bdc75a028 column=info:regioninfo, timestamp=1469494317848, value={ENCODED => 867b77333bdc75a028bb4c5e4b235f48, NAME => 'table_p8ddpd6q5z,1,1469494317178.867b7733 bb4c5e4b235f48. 3bdc75a028bb4c5e4b235f48.', STARTKEY => '1', ENDKEY => '3'} -.... -table_p8ddpd6q5z,3,1469494328323.98f019a753425e7977 column=info:regioninfo, timestamp=1469494328486, value={ENCODED => 98f019a753425e7977ab8636e32deeeb, NAME => 'table_p8ddpd6q5z,3,1469494328323.98f019a7 ab8636e32deeeb. 53425e7977ab8636e32deeeb.', STARTKEY => '3', ENDKEY => '7'} -.... -table_p8ddpd6q5z,7,1469494339662.94c64e748979ecbb16 column=info:regioninfo, timestamp=1469494339859, value={ENCODED => 94c64e748979ecbb166f6cc6550e25c6, NAME => 'table_p8ddpd6q5z,7,1469494339662.94c64e74 6f6cc6550e25c6. 8979ecbb166f6cc6550e25c6.', STARTKEY => '7', ENDKEY => '8'} -.... -table_p8ddpd6q5z,8,1469494339662.6d2b3f5fd1595ab8e7 column=info:regioninfo, timestamp=1469494339859, value={ENCODED => 6d2b3f5fd1595ab8e7c031876057b1ee, NAME => 'table_p8ddpd6q5z,8,1469494339662.6d2b3f5f c031876057b1ee. d1595ab8e7c031876057b1ee.', STARTKEY => '8', ENDKEY => ''} +table_p8ddpd6q5z,,1469494305548.68b9892220865cb6048 column=info:regioninfo, timestamp=1469494306375, value={ENCODED => 68b9892220865cb604809c950d1adf48, NAME => 'table_p8ddpd6q5z,,1469494305548.68b989222 09c950d1adf48. 0865cb604809c950d1adf48.', STARTKEY => '', ENDKEY => '1'} +.... +table_p8ddpd6q5z,1,1469494317178.867b77333bdc75a028 column=info:regioninfo, timestamp=1469494317848, value={ENCODED => 867b77333bdc75a028bb4c5e4b235f48, NAME => 'table_p8ddpd6q5z,1,1469494317178.867b7733 bb4c5e4b235f48. 3bdc75a028bb4c5e4b235f48.', STARTKEY => '1', ENDKEY => '3'} +.... +table_p8ddpd6q5z,3,1469494328323.98f019a753425e7977 column=info:regioninfo, timestamp=1469494328486, value={ENCODED => 98f019a753425e7977ab8636e32deeeb, NAME => 'table_p8ddpd6q5z,3,1469494328323.98f019a7 ab8636e32deeeb. 53425e7977ab8636e32deeeb.', STARTKEY => '3', ENDKEY => '7'} +.... +table_p8ddpd6q5z,7,1469494339662.94c64e748979ecbb16 column=info:regioninfo, timestamp=1469494339859, value={ENCODED => 94c64e748979ecbb166f6cc6550e25c6, NAME => 'table_p8ddpd6q5z,7,1469494339662.94c64e74 6f6cc6550e25c6. 8979ecbb166f6cc6550e25c6.', STARTKEY => '7', ENDKEY => '8'} +.... +table_p8ddpd6q5z,8,1469494339662.6d2b3f5fd1595ab8e7 column=info:regioninfo, timestamp=1469494339859, value={ENCODED => 6d2b3f5fd1595ab8e7c031876057b1ee, NAME => 'table_p8ddpd6q5z,8,1469494339662.6d2b3f5f c031876057b1ee. d1595ab8e7c031876057b1ee.', STARTKEY => '8', ENDKEY => ''} ---- Invoking the normalizer using ‘normalize’ int the HBase shell, the below log snippet from HMaster log shows the normalization plan computed as per the logic defined for @@ -3163,15 +3389,15 @@ and end key as ‘1’, with another region having start key as ‘1’ and end Now, that these regions have been merged we see a single new region with start key as ‘’ and end key as ‘3’ ---- -table_p8ddpd6q5z,,1469516907210.e06c9b83c4a252b130e column=info:mergeA, timestamp=1469516907431, -value=PBUF\x08\xA5\xD9\x9E\xAF\xE2*\x12\x1B\x0A\x07default\x12\x10table_p8ddpd6q5z\x1A\x00"\x011(\x000\x00 ea74d246741ba. 8\x00 +table_p8ddpd6q5z,,1469516907210.e06c9b83c4a252b130e column=info:mergeA, timestamp=1469516907431, +value=PBUF\x08\xA5\xD9\x9E\xAF\xE2*\x12\x1B\x0A\x07default\x12\x10table_p8ddpd6q5z\x1A\x00"\x011(\x000\x00 ea74d246741ba. 8\x00 table_p8ddpd6q5z,,1469516907210.e06c9b83c4a252b130e column=info:mergeB, timestamp=1469516907431, -value=PBUF\x08\xB5\xBA\x9F\xAF\xE2*\x12\x1B\x0A\x07default\x12\x10table_p8ddpd6q5z\x1A\x011"\x013(\x000\x0 ea74d246741ba. 08\x00 +value=PBUF\x08\xB5\xBA\x9F\xAF\xE2*\x12\x1B\x0A\x07default\x12\x10table_p8ddpd6q5z\x1A\x011"\x013(\x000\x0 ea74d246741ba. 08\x00 table_p8ddpd6q5z,,1469516907210.e06c9b83c4a252b130e column=info:regioninfo, timestamp=1469516907431, value={ENCODED => e06c9b83c4a252b130eea74d246741ba, NAME => 'table_p8ddpd6q5z,,1469516907210.e06c9b83c ea74d246741ba. 4a252b130eea74d246741ba.', STARTKEY => '', ENDKEY => '3'} -.... -table_p8ddpd6q5z,3,1469514778736.bf024670a847c0adff column=info:regioninfo, timestamp=1469514779417, value={ENCODED => bf024670a847c0adffb74b2e13408b32, NAME => 'table_p8ddpd6q5z,3,1469514778736.bf024670 b74b2e13408b32. a847c0adffb74b2e13408b32.' STARTKEY => '3', ENDKEY => '7'} -.... -table_p8ddpd6q5z,7,1469514790152.7c5a67bc755e649db2 column=info:regioninfo, timestamp=1469514790312, value={ENCODED => 7c5a67bc755e649db22f49af6270f1e1, NAME => 'table_p8ddpd6q5z,7,1469514790152.7c5a67bc 2f49af6270f1e1. 755e649db22f49af6270f1e1.', STARTKEY => '7', ENDKEY => '8'} +.... +table_p8ddpd6q5z,3,1469514778736.bf024670a847c0adff column=info:regioninfo, timestamp=1469514779417, value={ENCODED => bf024670a847c0adffb74b2e13408b32, NAME => 'table_p8ddpd6q5z,3,1469514778736.bf024670 b74b2e13408b32. a847c0adffb74b2e13408b32.' STARTKEY => '3', ENDKEY => '7'} +.... +table_p8ddpd6q5z,7,1469514790152.7c5a67bc755e649db2 column=info:regioninfo, timestamp=1469514790312, value={ENCODED => 7c5a67bc755e649db22f49af6270f1e1, NAME => 'table_p8ddpd6q5z,7,1469514790152.7c5a67bc 2f49af6270f1e1. 755e649db22f49af6270f1e1.', STARTKEY => '7', ENDKEY => '8'} .... table_p8ddpd6q5z,8,1469514790152.58e7503cda69f98f47 column=info:regioninfo, timestamp=1469514790312, value={ENCODED => 58e7503cda69f98f4755178e74288c3a, NAME => 'table_p8ddpd6q5z,8,1469514790152.58e7503c 55178e74288c3a. da69f98f4755178e74288c3a.', STARTKEY => '8', ENDKEY => ''} ---- diff --git a/src/main/asciidoc/_chapters/preface.adoc b/src/main/asciidoc/_chapters/preface.adoc index 280f2d8..deebdd3 100644 --- a/src/main/asciidoc/_chapters/preface.adoc +++ b/src/main/asciidoc/_chapters/preface.adoc @@ -68,7 +68,7 @@ Yours, the HBase Community. Please use link:https://issues.apache.org/jira/browse/hbase[JIRA] to report non-security-related bugs. -To protect existing HBase installations from new vulnerabilities, please *do not* use JIRA to report security-related bugs. Instead, send your report to the mailing list private@apache.org, which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report. +To protect existing HBase installations from new vulnerabilities, please *do not* use JIRA to report security-related bugs. Instead, send your report to the mailing list private@hbase.apache.org, which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report. [[hbase_supported_tested_definitions]] .Support and Testing Expectations diff --git a/src/main/asciidoc/_chapters/schema_design.adoc b/src/main/asciidoc/_chapters/schema_design.adoc index b7a6936..fdbd184 100644 --- a/src/main/asciidoc/_chapters/schema_design.adoc +++ b/src/main/asciidoc/_chapters/schema_design.adoc @@ -1158,7 +1158,7 @@ the regionserver/dfsclient side. * In `hbase-site.xml`, set the following parameters: - `dfs.client.read.shortcircuit = true` -- `dfs.client.read.shortcircuit.skip.checksum = true` so we don't double checksum (HBase does its own checksumming to save on i/os. See <> for more on this. +- `dfs.client.read.shortcircuit.skip.checksum = true` so we don't double checksum (HBase does its own checksumming to save on i/os. See <> for more on this. - `dfs.domain.socket.path` to match what was set for the datanodes. - `dfs.client.read.shortcircuit.buffer.size = 131072` Important to avoid OOME -- hbase has a default it uses if unset, see `hbase.dfs.client.read.shortcircuit.buffer.size`; its default is 131072. * Ensure data locality. In `hbase-site.xml`, set `hbase.hstore.min.locality.to.skip.major.compact = 0.7` (Meaning that 0.7 \<= n \<= 1) diff --git a/src/main/asciidoc/_chapters/security.adoc b/src/main/asciidoc/_chapters/security.adoc index 1afc131..56f6566 100644 --- a/src/main/asciidoc/_chapters/security.adoc +++ b/src/main/asciidoc/_chapters/security.adoc @@ -30,7 +30,7 @@ [IMPORTANT] .Reporting Security Bugs ==== -NOTE: To protect existing HBase installations from exploitation, please *do not* use JIRA to report security-related bugs. Instead, send your report to the mailing list private@apache.org, which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report. +NOTE: To protect existing HBase installations from exploitation, please *do not* use JIRA to report security-related bugs. Instead, send your report to the mailing list private@hbase.apache.org, which allows anyone to send messages, but restricts who can read them. Someone on that list will contact you to follow up on your report. HBase adheres to the Apache Software Foundation's policy on reported vulnerabilities, available at http://apache.org/security/. @@ -1739,7 +1739,7 @@ All options have been discussed separately in the sections above. hbase.superuser - hbase, admin + hbase,admin @@ -1759,8 +1759,7 @@ All options have been discussed separately in the sections above. hbase.coprocessor.regionserver.classes - org.apache.hadoop/hbase.security.access.AccessController, - org.apache.hadoop.hbase.security.access.VisibilityController + org.apache.hadoop.hbase.security.access.AccessController diff --git a/src/main/asciidoc/_chapters/spark.adoc b/src/main/asciidoc/_chapters/spark.adoc new file mode 100644 index 0000000..d5089f2 --- /dev/null +++ b/src/main/asciidoc/_chapters/spark.adoc @@ -0,0 +1,699 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + . . http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[spark]] += HBase and Spark +:doctype: book +:numbered: +:toc: left +:icons: font +:experimental: + +link:https://spark.apache.org/[Apache Spark] is a software framework that is used +to process data in memory in a distributed manner, and is replacing MapReduce in +many use cases. + +Spark itself is out of scope of this document, please refer to the Spark site for +more information on the Spark project and subprojects. This document will focus +on 4 main interaction points between Spark and HBase. Those interaction points are: + +Basic Spark:: + The ability to have an HBase Connection at any point in your Spark DAG. +Spark Streaming:: + The ability to have an HBase Connection at any point in your Spark Streaming + application. +Spark Bulk Load:: + The ability to write directly to HBase HFiles for bulk insertion into HBase +SparkSQL/DataFrames:: + The ability to write SparkSQL that draws on tables that are represented in HBase. + +The following sections will walk through examples of all these interaction points. + +== Basic Spark + +This section discusses Spark HBase integration at the lowest and simplest levels. +All the other interaction points are built upon the concepts that will be described +here. + +At the root of all Spark and HBase integration is the HBaseContext. The HBaseContext +takes in HBase configurations and pushes them to the Spark executors. This allows +us to have an HBase Connection per Spark Executor in a static location. + +For reference, Spark Executors can be on the same nodes as the Region Servers or +on different nodes, there is no dependence on co-location. Think of every Spark +Executor as a multi-threaded client application. This allows any Spark Tasks +running on the executors to access the shared Connection object. + +.HBaseContext Usage Example +==== + +This example shows how HBaseContext can be used to do a `foreachPartition` on a RDD +in Scala: + +[source, scala] +---- +val sc = new SparkContext("local", "test") +val config = new HBaseConfiguration() + +... + +val hbaseContext = new HBaseContext(sc, config) + +rdd.hbaseForeachPartition(hbaseContext, (it, conn) => { + val bufferedMutator = conn.getBufferedMutator(TableName.valueOf("t1")) + it.foreach((putRecord) => { +. val put = new Put(putRecord._1) +. putRecord._2.foreach((putValue) => put.addColumn(putValue._1, putValue._2, putValue._3)) +. bufferedMutator.mutate(put) + }) + bufferedMutator.flush() + bufferedMutator.close() +}) +---- + +Here is the same example implemented in Java: + +[source, java] +---- +JavaSparkContext jsc = new JavaSparkContext(sparkConf); + +try { + List list = new ArrayList<>(); + list.add(Bytes.toBytes("1")); + ... + list.add(Bytes.toBytes("5")); + + JavaRDD rdd = jsc.parallelize(list); + Configuration conf = HBaseConfiguration.create(); + + JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf); + + hbaseContext.foreachPartition(rdd, + new VoidFunction, Connection>>() { + public void call(Tuple2, Connection> t) + throws Exception { + Table table = t._2().getTable(TableName.valueOf(tableName)); + BufferedMutator mutator = t._2().getBufferedMutator(TableName.valueOf(tableName)); + while (t._1().hasNext()) { + byte[] b = t._1().next(); + Result r = table.get(new Get(b)); + if (r.getExists()) { + mutator.mutate(new Put(b)); + } + } + + mutator.flush(); + mutator.close(); + table.close(); + } + }); +} finally { + jsc.stop(); +} +---- +==== + +All functionality between Spark and HBase will be supported both in Scala and in +Java, with the exception of SparkSQL which will support any language that is +supported by Spark. For the remaining of this documentation we will focus on +Scala examples. + +The examples above illustrate how to do a foreachPartition with a connection. A +number of other Spark base functions are supported out of the box: + +// tag::spark_base_functions[] +`bulkPut`:: For massively parallel sending of puts to HBase +`bulkDelete`:: For massively parallel sending of deletes to HBase +`bulkGet`:: For massively parallel sending of gets to HBase to create a new RDD +`mapPartition`:: To do a Spark Map function with a Connection object to allow full +access to HBase +`hBaseRDD`:: To simplify a distributed scan to create a RDD +// end::spark_base_functions[] + +For examples of all these functionalities, see the +link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark integration] +in the link:https://github.com/apache/hbase-connectors[hbase-connectors] repository +(the hbase-spark connectors live outside hbase core in a related, +Apache HBase project maintained, associated repo). + +== Spark Streaming +https://spark.apache.org/streaming/[Spark Streaming] is a micro batching stream +processing framework built on top of Spark. HBase and Spark Streaming make great +companions in that HBase can help serve the following benefits alongside Spark +Streaming. + +* A place to grab reference data or profile data on the fly +* A place to store counts or aggregates in a way that supports Spark Streaming's +promise of _only once processing_. + +The link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark integration] +with Spark Streaming is similar to its normal Spark integration points, in that the following +commands are possible straight off a Spark Streaming DStream. + +include::spark.adoc[tags=spark_base_functions] + +.`bulkPut` Example with DStreams +==== + +Below is an example of bulkPut with DStreams. It is very close in feel to the RDD +bulk put. + +[source, scala] +---- +val sc = new SparkContext("local", "test") +val config = new HBaseConfiguration() + +val hbaseContext = new HBaseContext(sc, config) +val ssc = new StreamingContext(sc, Milliseconds(200)) + +val rdd1 = ... +val rdd2 = ... + +val queue = mutable.Queue[RDD[(Array[Byte], Array[(Array[Byte], + Array[Byte], Array[Byte])])]]() + +queue += rdd1 +queue += rdd2 + +val dStream = ssc.queueStream(queue) + +dStream.hbaseBulkPut( + hbaseContext, + TableName.valueOf(tableName), + (putRecord) => { + val put = new Put(putRecord._1) + putRecord._2.foreach((putValue) => put.addColumn(putValue._1, putValue._2, putValue._3)) + put + }) +---- + +There are three inputs to the `hbaseBulkPut` function. +The hbaseContext that carries the configuration broadcast information link +to the HBase Connections in the executor, the table name of the table we are +putting data into, and a function that will convert a record in the DStream +into an HBase Put object. +==== + +== Bulk Load + +There are two options for bulk loading data into HBase with Spark. There is the +basic bulk load functionality that will work for cases where your rows have +millions of columns and cases where your columns are not consolidated and +partitioned before the map side of the Spark bulk load process. + +There is also a thin record bulk load option with Spark. This second option is +designed for tables that have less then 10k columns per row. The advantage +of this second option is higher throughput and less over-all load on the Spark +shuffle operation. + +Both implementations work more or less like the MapReduce bulk load process in +that a partitioner partitions the rowkeys based on region splits and +the row keys are sent to the reducers in order, so that HFiles can be written +out directly from the reduce phase. + +In Spark terms, the bulk load will be implemented around a Spark +`repartitionAndSortWithinPartitions` followed by a Spark `foreachPartition`. + +First lets look at an example of using the basic bulk load functionality + +.Bulk Loading Example +==== + +The following example shows bulk loading with Spark. + +[source, scala] +---- +val sc = new SparkContext("local", "test") +val config = new HBaseConfiguration() + +val hbaseContext = new HBaseContext(sc, config) + +val stagingFolder = ... +val rdd = sc.parallelize(Array( + (Bytes.toBytes("1"), + (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), Bytes.toBytes("foo1"))), + (Bytes.toBytes("3"), + (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), Bytes.toBytes("foo2.b"))), ... + +rdd.hbaseBulkLoad(TableName.valueOf(tableName), + t => { + val rowKey = t._1 + val family:Array[Byte] = t._2(0)._1 + val qualifier = t._2(0)._2 + val value = t._2(0)._3 + + val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier) + + Seq((keyFamilyQualifier, value)).iterator + }, + stagingFolder.getPath) + +val load = new LoadIncrementalHFiles(config) +load.doBulkLoad(new Path(stagingFolder.getPath), + conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName))) +---- +==== + +The `hbaseBulkLoad` function takes three required parameters: + +. The table name of the table we intend to bulk load too + +. A function that will convert a record in the RDD to a tuple key value par. With +the tuple key being a KeyFamilyQualifer object and the value being the cell value. +The KeyFamilyQualifer object will hold the RowKey, Column Family, and Column Qualifier. +The shuffle will partition on the RowKey but will sort by all three values. + +. The temporary path for the HFile to be written out too + +Following the Spark bulk load command, use the HBase's LoadIncrementalHFiles object +to load the newly created HFiles into HBase. + +.Additional Parameters for Bulk Loading with Spark + +You can set the following attributes with additional parameter options on hbaseBulkLoad. + +* Max file size of the HFiles +* A flag to exclude HFiles from compactions +* Column Family settings for compression, bloomType, blockSize, and dataBlockEncoding + +.Using Additional Parameters +==== + +[source, scala] +---- +val sc = new SparkContext("local", "test") +val config = new HBaseConfiguration() + +val hbaseContext = new HBaseContext(sc, config) + +val stagingFolder = ... +val rdd = sc.parallelize(Array( + (Bytes.toBytes("1"), + (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), Bytes.toBytes("foo1"))), + (Bytes.toBytes("3"), + (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), Bytes.toBytes("foo2.b"))), ... + +val familyHBaseWriterOptions = new java.util.HashMap[Array[Byte], FamilyHFileWriteOptions] +val f1Options = new FamilyHFileWriteOptions("GZ", "ROW", 128, "PREFIX") + +familyHBaseWriterOptions.put(Bytes.toBytes("columnFamily1"), f1Options) + +rdd.hbaseBulkLoad(TableName.valueOf(tableName), + t => { + val rowKey = t._1 + val family:Array[Byte] = t._2(0)._1 + val qualifier = t._2(0)._2 + val value = t._2(0)._3 + + val keyFamilyQualifier= new KeyFamilyQualifier(rowKey, family, qualifier) + + Seq((keyFamilyQualifier, value)).iterator + }, + stagingFolder.getPath, + familyHBaseWriterOptions, + compactionExclude = false, + HConstants.DEFAULT_MAX_FILE_SIZE) + +val load = new LoadIncrementalHFiles(config) +load.doBulkLoad(new Path(stagingFolder.getPath), + conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName))) +---- +==== + +Now lets look at how you would call the thin record bulk load implementation + +.Using thin record bulk load +==== + +[source, scala] +---- +val sc = new SparkContext("local", "test") +val config = new HBaseConfiguration() + +val hbaseContext = new HBaseContext(sc, config) + +val stagingFolder = ... +val rdd = sc.parallelize(Array( + ("1", + (Bytes.toBytes(columnFamily1), Bytes.toBytes("a"), Bytes.toBytes("foo1"))), + ("3", + (Bytes.toBytes(columnFamily1), Bytes.toBytes("b"), Bytes.toBytes("foo2.b"))), ... + +rdd.hbaseBulkLoadThinRows(hbaseContext, + TableName.valueOf(tableName), + t => { + val rowKey = t._1 + + val familyQualifiersValues = new FamiliesQualifiersValues + t._2.foreach(f => { + val family:Array[Byte] = f._1 + val qualifier = f._2 + val value:Array[Byte] = f._3 + + familyQualifiersValues +=(family, qualifier, value) + }) + (new ByteArrayWrapper(Bytes.toBytes(rowKey)), familyQualifiersValues) + }, + stagingFolder.getPath, + new java.util.HashMap[Array[Byte], FamilyHFileWriteOptions], + compactionExclude = false, + 20) + +val load = new LoadIncrementalHFiles(config) +load.doBulkLoad(new Path(stagingFolder.getPath), + conn.getAdmin, table, conn.getRegionLocator(TableName.valueOf(tableName))) +---- +==== + +Note that the big difference in using bulk load for thin rows is the function +returns a tuple with the first value being the row key and the second value +being an object of FamiliesQualifiersValues, which will contain all the +values for this row for all column families. + +== SparkSQL/DataFrames + +The link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark integration] +leverages +link:https://databricks.com/blog/2015/01/09/spark-sql-data-sources-api-unified-data-access-for-the-spark-platform.html[DataSource API] +(link:https://issues.apache.org/jira/browse/SPARK-3247[SPARK-3247]) +introduced in Spark-1.2.0, which bridges the gap between simple HBase KV store and complex +relational SQL queries and enables users to perform complex data analytical work +on top of HBase using Spark. HBase Dataframe is a standard Spark Dataframe, and is able to +interact with any other data sources such as Hive, Orc, Parquet, JSON, etc. +The link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark integration] +applies critical techniques such as partition pruning, column pruning, +predicate pushdown and data locality. + +To use the +link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark integration] +connector, users need to define the Catalog for the schema mapping +between HBase and Spark tables, prepare the data and populate the HBase table, +then load the HBase DataFrame. After that, users can do integrated query and access records +in HBase tables with SQL query. The following illustrates the basic procedure. + +=== Define catalog + +[source, scala] +---- +def catalog = s"""{ +       |"table":{"namespace":"default", "name":"table1"}, +       |"rowkey":"key", +       |"columns":{ +         |"col0":{"cf":"rowkey", "col":"key", "type":"string"}, +         |"col1":{"cf":"cf1", "col":"col1", "type":"boolean"}, +         |"col2":{"cf":"cf2", "col":"col2", "type":"double"}, +         |"col3":{"cf":"cf3", "col":"col3", "type":"float"}, +         |"col4":{"cf":"cf4", "col":"col4", "type":"int"}, +         |"col5":{"cf":"cf5", "col":"col5", "type":"bigint"}, +         |"col6":{"cf":"cf6", "col":"col6", "type":"smallint"}, +         |"col7":{"cf":"cf7", "col":"col7", "type":"string"}, +         |"col8":{"cf":"cf8", "col":"col8", "type":"tinyint"} +       |} +     |}""".stripMargin +---- + +Catalog defines a mapping between HBase and Spark tables. There are two critical parts of this catalog. +One is the rowkey definition and the other is the mapping between table column in Spark and +the column family and column qualifier in HBase. The above defines a schema for a HBase table +with name as table1, row key as key and a number of columns (col1 `-` col8). Note that the rowkey +also has to be defined in details as a column (col0), which has a specific cf (rowkey). + +=== Save the DataFrame + +[source, scala] +---- +case class HBaseRecord( + col0: String, + col1: Boolean, + col2: Double, + col3: Float, + col4: Int,        + col5: Long, + col6: Short, + col7: String, + col8: Byte) + +object HBaseRecord +{                                                                                                              + def apply(i: Int, t: String): HBaseRecord = { + val s = s"""row${"%03d".format(i)}"""        + HBaseRecord(s, + i % 2 == 0, + i.toDouble, + i.toFloat,   + i, + i.toLong, + i.toShort,   + s"String$i: $t",       + i.toByte) + } +} + +val data = (0 to 255).map { i =>  HBaseRecord(i, "extra")} + +sc.parallelize(data).toDF.write.options( + Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5")) + .format("org.apache.hadoop.hbase.spark ") + .save() + +---- +`data` prepared by the user is a local Scala collection which has 256 HBaseRecord objects. +`sc.parallelize(data)` function distributes `data` to form an RDD. `toDF` returns a DataFrame. +`write` function returns a DataFrameWriter used to write the DataFrame to external storage +systems (e.g. HBase here). Given a DataFrame with specified schema `catalog`, `save` function +will create an HBase table with 5 regions and save the DataFrame inside. + +=== Load the DataFrame + +[source, scala] +---- +def withCatalog(cat: String): DataFrame = { + sqlContext + .read + .options(Map(HBaseTableCatalog.tableCatalog->cat)) + .format("org.apache.hadoop.hbase.spark") + .load() +} +val df = withCatalog(catalog) +---- +In ‘withCatalog’ function, sqlContext is a variable of SQLContext, which is the entry point +for working with structured data (rows and columns) in Spark. +`read` returns a DataFrameReader that can be used to read data in as a DataFrame. +`option` function adds input options for the underlying data source to the DataFrameReader, +and `format` function specifies the input data source format for the DataFrameReader. +The `load()` function loads input in as a DataFrame. The date frame `df` returned +by `withCatalog` function could be used to access HBase table, such as 4.4 and 4.5. + +=== Language Integrated Query + +[source, scala] +---- +val s = df.filter(($"col0" <= "row050" && $"col0" > "row040") || + $"col0" === "row005" || + $"col0" <= "row005") + .select("col0", "col1", "col4") +s.show +---- +DataFrame can do various operations, such as join, sort, select, filter, orderBy and so on. +`df.filter` above filters rows using the given SQL expression. `select` selects a set of columns: +`col0`, `col1` and `col4`. + +=== SQL Query + +[source, scala] +---- +df.registerTempTable("table1") +sqlContext.sql("select count(col1) from table1").show +---- + +`registerTempTable` registers `df` DataFrame as a temporary table using the table name `table1`. +The lifetime of this temporary table is tied to the SQLContext that was used to create `df`. +`sqlContext.sql` function allows the user to execute SQL queries. + +=== Others + +.Query with different timestamps +==== +In HBaseSparkConf, four parameters related to timestamp can be set. They are TIMESTAMP, +MIN_TIMESTAMP, MAX_TIMESTAMP and MAX_VERSIONS respectively. Users can query records with +different timestamps or time ranges with MIN_TIMESTAMP and MAX_TIMESTAMP. In the meantime, +use concrete value instead of tsSpecified and oldMs in the examples below. + +The example below shows how to load df DataFrame with different timestamps. +tsSpecified is specified by the user. +HBaseTableCatalog defines the HBase and Relation relation schema. +writeCatalog defines catalog for the schema mapping. + +[source, scala] +---- +val df = sqlContext.read + .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString)) + .format("org.apache.hadoop.hbase.spark") + .load() +---- + +The example below shows how to load df DataFrame with different time ranges. +oldMs is specified by the user. + +[source, scala] +---- +val df = sqlContext.read + .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0", + HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString)) + .format("org.apache.hadoop.hbase.spark") + .load() +---- +After loading df DataFrame, users can query data. + +[source, scala] +---- +df.registerTempTable("table") +sqlContext.sql("select count(col1) from table").show +---- +==== + +.Native Avro support +==== +The link:https://github.com/apache/hbase-connectors/tree/master/spark[hbase-spark integration] +connector supports different data formats like Avro, JSON, etc. The use case below +shows how spark supports Avro. Users can persist the Avro record into HBase directly. Internally, +the Avro schema is converted to a native Spark Catalyst data type automatically. +Note that both key-value parts in an HBase table can be defined in Avro format. + +1) Define catalog for the schema mapping: + +[source, scala] +---- +def catalog = s"""{ + |"table":{"namespace":"default", "name":"Avrotable"}, + |"rowkey":"key", + |"columns":{ + |"col0":{"cf":"rowkey", "col":"key", "type":"string"}, + |"col1":{"cf":"cf1", "col":"col1", "type":"binary"} + |} + |}""".stripMargin +---- + +`catalog` is a schema for a HBase table named `Avrotable`. row key as key and +one column col1. The rowkey also has to be defined in details as a column (col0), +which has a specific cf (rowkey). + +2) Prepare the Data: + +[source, scala] +---- + object AvroHBaseRecord { + val schemaString = + s"""{"namespace": "example.avro", + | "type": "record", "name": "User", + | "fields": [ + | {"name": "name", "type": "string"}, + | {"name": "favorite_number", "type": ["int", "null"]}, + | {"name": "favorite_color", "type": ["string", "null"]}, + | {"name": "favorite_array", "type": {"type": "array", "items": "string"}}, + | {"name": "favorite_map", "type": {"type": "map", "values": "int"}} + | ] }""".stripMargin + + val avroSchema: Schema = { + val p = new Schema.Parser + p.parse(schemaString) + } + + def apply(i: Int): AvroHBaseRecord = { + val user = new GenericData.Record(avroSchema); + user.put("name", s"name${"%03d".format(i)}") + user.put("favorite_number", i) + user.put("favorite_color", s"color${"%03d".format(i)}") + val favoriteArray = new GenericData.Array[String](2, avroSchema.getField("favorite_array").schema()) + favoriteArray.add(s"number${i}") + favoriteArray.add(s"number${i+1}") + user.put("favorite_array", favoriteArray) + import collection.JavaConverters._ + val favoriteMap = Map[String, Int](("key1" -> i), ("key2" -> (i+1))).asJava + user.put("favorite_map", favoriteMap) + val avroByte = AvroSedes.serialize(user, avroSchema) + AvroHBaseRecord(s"name${"%03d".format(i)}", avroByte) + } + } + + val data = (0 to 255).map { i => + AvroHBaseRecord(i) + } +---- + +`schemaString` is defined first, then it is parsed to get `avroSchema`. `avroSchema` is used to +generate `AvroHBaseRecord`. `data` prepared by users is a local Scala collection +which has 256 `AvroHBaseRecord` objects. + +3) Save DataFrame: + +[source, scala] +---- + sc.parallelize(data).toDF.write.options( + Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5")) + .format("org.apache.spark.sql.execution.datasources.hbase") + .save() +---- + +Given a data frame with specified schema `catalog`, above will create an HBase table with 5 +regions and save the data frame inside. + +4) Load the DataFrame + +[source, scala] +---- +def avroCatalog = s"""{ + |"table":{"namespace":"default", "name":"avrotable"}, + |"rowkey":"key", + |"columns":{ + |"col0":{"cf":"rowkey", "col":"key", "type":"string"}, + |"col1":{"cf":"cf1", "col":"col1", "avro":"avroSchema"} + |} + |}""".stripMargin + + def withCatalog(cat: String): DataFrame = { + sqlContext + .read + .options(Map("avroSchema" -> AvroHBaseRecord.schemaString, HBaseTableCatalog.tableCatalog -> avroCatalog)) + .format("org.apache.spark.sql.execution.datasources.hbase") + .load() + } + val df = withCatalog(catalog) +---- + +In `withCatalog` function, `read` returns a DataFrameReader that can be used to read data in as a DataFrame. +The `option` function adds input options for the underlying data source to the DataFrameReader. +There are two options: one is to set `avroSchema` as `AvroHBaseRecord.schemaString`, and one is to +set `HBaseTableCatalog.tableCatalog` as `avroCatalog`. The `load()` function loads input in as a DataFrame. +The date frame `df` returned by `withCatalog` function could be used to access the HBase table. + +5) SQL Query + +[source, scala] +---- + df.registerTempTable("avrotable") + val c = sqlContext.sql("select count(1) from avrotable"). +---- + +After loading df DataFrame, users can query data. registerTempTable registers df DataFrame +as a temporary table using the table name avrotable. `sqlContext.sql` function allows the +user to execute SQL queries. +==== diff --git a/src/main/asciidoc/_chapters/sync_replication.adoc b/src/main/asciidoc/_chapters/sync_replication.adoc new file mode 100644 index 0000000..d28b9a9 --- /dev/null +++ b/src/main/asciidoc/_chapters/sync_replication.adoc @@ -0,0 +1,125 @@ +//// +/** + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +//// + +[[syncreplication]] += Synchronous Replication +:doctype: book +:numbered: +:toc: left +:icons: font +:experimental: +:source-language: java + +== Background + +The current <> in HBase in asynchronous. So if the master cluster crashes, the slave cluster may not have the +newest data. If users want strong consistency then they can not switch to the slave cluster. + +== Design + +Please see the design doc on link:https://issues.apache.org/jira/browse/HBASE-19064[HBASE-19064] + +== Operation and maintenance + +Case.1 Setup two synchronous replication clusters:: + +* Add a synchronous peer in both source cluster and peer cluster. + +For source cluster: +[source,ruby] +---- +hbase> add_peer '1', CLUSTER_KEY => 'lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase-slave', REMOTE_WAL_DIR=>'hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase-slave/remoteWALs', TABLE_CFS => {"ycsb-test"=>[]} +---- + +For peer cluster: +[source,ruby] +---- +hbase> add_peer '1', CLUSTER_KEY => 'lg-hadoop-tst-st01.bj:10010,lg-hadoop-tst-st02.bj:10010,lg-hadoop-tst-st03.bj:10010:/hbase/test-hbase', REMOTE_WAL_DIR=>'hdfs://lg-hadoop-tst-st01.bj:20100/hbase/test-hbase/remoteWALs', TABLE_CFS => {"ycsb-test"=>[]} +---- + +NOTE: For synchronous replication, the current implementation require that we have the same peer id for both source +and peer cluster. Another thing that need attention is: the peer does not support cluster-level, namespace-level, or +cf-level replication, only support table-level replication now. + +* Transit the peer cluster to be STANDBY state + +[source,ruby] +---- +hbase> transit_peer_sync_replication_state '1', 'STANDBY' +---- + +* Transit the source cluster to be ACTIVE state + +[source,ruby] +---- +hbase> transit_peer_sync_replication_state '1', 'ACTIVE' +---- + +Now, the synchronous replication has been set up successfully. the HBase client can only request to source cluster, if +request to peer cluster, the peer cluster which is STANDBY state now will reject the read/write requests. + +Case.2 How to operate when standby cluster crashed:: + +If the standby cluster has been crashed, it will fail to write remote WAL for the active cluster. So we need to transit +the source cluster to DOWNGRANDE_ACTIVE state, which means source cluster won't write any remote WAL any more, but +the normal replication (asynchronous Replication) can still work fine, it queue the newly written WALs, but the +replication block until the peer cluster come back. + +[source,ruby] +---- +hbase> transit_peer_sync_replication_state '1', 'DOWNGRADE_ACTIVE' +---- + +Once the peer cluster come back, we can just transit the source cluster to ACTIVE, to ensure that the replication will be +synchronous. + +[source,ruby] +---- +hbase> transit_peer_sync_replication_state '1', 'ACTIVE' +---- + +Case.3 How to operate when active cluster crashed:: + +If the active cluster has been crashed (it may be not reachable now), so let's just transit the standby cluster to +DOWNGRANDE_ACTIVE state, and after that, we should redirect all the requests from client to the DOWNGRADE_ACTIVE cluster. + +[source,ruby] +---- +hbase> transit_peer_sync_replication_state '1', 'DOWNGRADE_ACTIVE' +---- + +If the crashed cluster come back again, we just need to transit it to STANDBY directly. Otherwise if you transit the +cluster to DOWNGRADE_ACTIVE, the original ACTIVE cluster may have redundant data compared to the current ACTIVE +cluster. Because we designed to write source cluster WALs and remote cluster WALs concurrently, so it's possible that +the source cluster WALs has more data than the remote cluster, which result in data inconsistency. The procedure of +transiting ACTIVE to STANDBY has no problem, because we'll skip to replay the original WALs. + +[source,ruby] +---- +hbase> transit_peer_sync_replication_state '1', 'STANDBY' +---- + +After that, we can promote the DOWNGRADE_ACTIVE cluster to ACTIVE now, to ensure that the replication will be synchronous. + +[source,ruby] +---- +hbase> transit_peer_sync_replication_state '1', 'ACTIVE' +---- diff --git a/src/main/asciidoc/_chapters/troubleshooting.adoc b/src/main/asciidoc/_chapters/troubleshooting.adoc index 6795b8a..9fc7c35 100644 --- a/src/main/asciidoc/_chapters/troubleshooting.adoc +++ b/src/main/asciidoc/_chapters/troubleshooting.adoc @@ -868,9 +868,9 @@ Snapshots:: When you create a snapshot, HBase retains everything it needs to recreate the table's state at that time of the snapshot. This includes deleted cells or expired versions. For this reason, your snapshot usage pattern should be well-planned, and you should - prune snapshots that you no longer need. Snapshots are stored in `/hbase/.snapshots`, + prune snapshots that you no longer need. Snapshots are stored in `/hbase/.hbase-snapshot`, and archives needed to restore snapshots are stored in - `/hbase/.archive/<_tablename_>/<_region_>/<_column_family_>/`. + `/hbase/archive/<_tablename_>/<_region_>/<_column_family_>/`. *Do not* manage snapshots or archives manually via HDFS. HBase provides APIs and HBase Shell commands for managing them. For more information, see <>. diff --git a/src/main/asciidoc/_chapters/upgrading.adoc b/src/main/asciidoc/_chapters/upgrading.adoc index 6dc788a..2a33e42 100644 --- a/src/main/asciidoc/_chapters/upgrading.adoc +++ b/src/main/asciidoc/_chapters/upgrading.adoc @@ -314,6 +314,20 @@ Quitting... == Upgrade Paths +[[upgrade 2.2]] +=== Upgrade from 2.0 or 2.1 to 2.2+ + +HBase 2.2+ uses a new Procedure form assiging/unassigning/moving Regions. It does not process HBase 2.1 and 2.0's Unassign/Assign Procedure types. Upgrade requires that we first drain the Master Procedure Store of old style Procedures before starting the new 2.2 Master. So you need to make sure that before you kill the old version (2.0 or 2.1) Master, there is no region in transition. And once the new version (2.2+) Master is up, you can rolling upgrade RegionServers one by one. + +And there is a more safer way if you are running 2.1.1+ or 2.0.3+ cluster. It need four steps to upgrade Master. + +. Shutdown both active and standby Masters (Your cluster will continue to server reads and writes without interruption). +. Set the property hbase.procedure.upgrade-to-2-2 to true in hbase-site.xml for the Master, and start only one Master, still using the 2.1.1+ (or 2.0.3+) version. +. Wait until the Master quits. Confirm that there is a 'READY TO ROLLING UPGRADE' message in the Master log as the cause of the shutdown. The Procedure Store is now empty. +. Start new Masters with the new 2.2+ version. + +Then you can rolling upgrade RegionServers one by one. See link:https://issues.apache.org/jira/browse/HBASE-21075[HBASE-21075] for more details. + [[upgrade2.0]] === Upgrading from 1.x to 2.x @@ -331,7 +345,10 @@ As noted in the section <>, HBase 2.0+ requires a minimum o .HBCK must match HBase server version You *must not* use an HBase 1.x version of HBCK against an HBase 2.0+ cluster. HBCK is strongly tied to the HBase server version. Using the HBCK tool from an earlier release against an HBase 2.0+ cluster will destructively alter said cluster in unrecoverable ways. -As of HBase 2.0, HBCK is a read-only tool that can report the status of some non-public system internals. You should not rely on the format nor content of these internals to remain consistent across HBase releases. +As of HBase 2.0, HBCK (A.K.A _HBCK1_ or _hbck1_) is a read-only tool that can report the status of some non-public system internals. You should not rely on the format nor content of these internals to remain consistent across HBase releases. + +To read about HBCK's replacement, see <> in <>. + //// Link to a ref guide section on HBCK in 2.0 that explains use and calls out the inability of clients and server sides to detect version of each other. @@ -611,6 +628,19 @@ Performance is also an area that is now under active review so look forward to improvement in coming releases (See link:https://issues.apache.org/jira/browse/HBASE-20188[HBASE-20188 TESTING Performance]). +[[upgrade2.0.it.kerberos]] +.Integration Tests and Kerberos +Integration Tests (`IntegrationTests*`) used to rely on the Kerberos credential cache +for authentication against secured clusters. This used to lead to tests failing due +to authentication failures when the tickets in the credential cache expired. +As of hbase-2.0.0 (and hbase-1.3.0+), the integration test clients will make use +of the configuration properties `hbase.client.keytab.file` and +`hbase.client.kerberos.principal`. They are required. The clients will perform a +login from the configured keytab file and automatically refresh the credentials +in the background for the process lifetime (See +link:https://issues.apache.org/jira/browse/HBASE-16231[HBASE-16231]). + + //// This would be a good place to link to an appendix on migrating applications //// @@ -731,6 +761,11 @@ Notes: Doing a raw scan will now return results that have expired according to TTL settings. +[[upgrade1.3]] +=== Upgrading from pre-1.3 to 1.3+ +If running Integration Tests under Kerberos, see <>. + + [[upgrade1.0]] === Upgrading to 1.x diff --git a/src/main/asciidoc/book.adoc b/src/main/asciidoc/book.adoc index 2b01749..6e9e19d 100644 --- a/src/main/asciidoc/book.adoc +++ b/src/main/asciidoc/book.adoc @@ -65,9 +65,11 @@ include::_chapters/security.adoc[] include::_chapters/architecture.adoc[] include::_chapters/hbase_mob.adoc[] include::_chapters/inmemory_compaction.adoc[] +include::_chapters/sync_replication.adoc[] include::_chapters/hbase_apis.adoc[] include::_chapters/external_apis.adoc[] include::_chapters/thrift_filter_language.adoc[] +include::_chapters/spark.adoc[] include::_chapters/cp.adoc[] include::_chapters/performance.adoc[] include::_chapters/troubleshooting.adoc[] @@ -85,7 +87,6 @@ include::_chapters/community.adoc[] include::_chapters/appendix_contributing_to_documentation.adoc[] include::_chapters/faq.adoc[] -include::_chapters/hbck_in_depth.adoc[] include::_chapters/appendix_acl_matrix.adoc[] include::_chapters/compression.adoc[] include::_chapters/sql.adoc[] -- 2.7.4