diff --git hbase-native-client/BUILDING.md hbase-native-client/BUILDING.md index 4c06776..ea5f7f8 100644 --- hbase-native-client/BUILDING.md +++ hbase-native-client/BUILDING.md @@ -19,11 +19,84 @@ under the License. # Building HBase native client -The HBase native client build using buck and produces a linux library. +The HBase native client build using make and produces a linux library named +`libHBaseClient`. For development, the build tool [buck](buckbuild.com) can be used to build the project, optionally using the docker environment. +## Dependencies +The code uses C++14 features extensively, so it depends on a GCC version with -std=c++14 support. -# Dependencies +HBase Client depends on these third party libraries: +### Google Protobuf +Google Protobuf, version `2.7.0`. + +### Apache ZooKeeper +Apache ZooKeeper C client version `3.4.8` or higher. + +### Facebook Folly +Folly is a generic library from Facebook that is complementary to boost and +std. We use folly for async socket IO, threading and support for Future's. +HBase client depends on a recent version of folly at least, `v2017.09.04.00`. + +### Facebook Wangle +Wangle is a C++ implementation of the popular Netty framework in Java. HBase +client depends on wangle for async RPC framework. Due to licensing issues +a version greater than `v2017.09.04.00` is required. + +### Boost +We depend on boost::predef, thus requiring at least version `1.56`. + +### Google glog, gtest, gflags +We use these popular libraries for logging, unit testing/mocking and parsing +command line flags respectively. + + +## Installing base dependencies - CentOS 7 +```bash +sudo yum install -y centos-release-scl && sudo yum-config-manager --enable rhel-server-rhscl-7-rpms +sudo yum install -y vim maven python-pip doxygen graphviz clang-devel shtool cmake libtool \ + libevent-devel openssl-devel cyrus-sasl-devel krb5-devel \ + devtoolset-6 devtoolset-6-libatomic-devel devtoolset-6-valgrind \ + glog-devel gtest-devel gflags-devel double-conversion-devel gmock-devel +pip install yapf +scl enable devtoolset-6 bash +``` + +Native client does not depend on Java, however, unit tests do need to use JNI to launch a mini +hbase cluster in the same process. Install these to be able to run the unit tests: +```bash +sudo yum install -y java-1.8.0-openjdk-devel +``` + +## Installing base dependencies - Ubuntu 16 +```bash +sudo apt-get install -y \ + autoconf automake libtool curl make g++ unzip \ + vim maven inetutils-ping python-pip doxygen graphviz clang-format \ + valgrind dh-autoreconf pkg-config libboost-all-dev libevent1-dev cmake\ + openssl openssl-dev libssl-dev libkrb5-dev libsasl2-dev\ + libdouble-conversion-dev libatomic1-dbg +pip install yapf +``` + +Native client does not depend on Java, however, unit tests do need to use JNI to launch a mini +hbase cluster in the same process. Install these to be able to run the unit tests: +```bash +sudo apt-get install -y openjdk-8-dbg openjdk-8-jdk openjdk-8-source +``` + +## Installing dependencies - All Operating Systems +After installing the base dependencies from the OS, you may need to manually compile these +libraries (protobuf, wangle, folly, boost, etc) yourself. There is a helper script which you can run manually, or can copy-paste the contents. The script needs sudo permissions, and installs the libs under `/usr/local/lib/`. + + +```bash +cd hbase-native-client +./dev-support/install-hbase-deps-common.sh +``` + + +## Building using docker The easiest way to build hbase-native-client is to use [Docker](https://www.docker.com/). This will mean that any platform with docker can be used to build the hbase-native-client. It will also @@ -37,41 +110,40 @@ can help speed things up by using nfs. If possible pairing virtual box with dinghy will result in the fastest, most stable docker environment. However none of it is needed. -# Building using docker +Then go into the hbase-native-client directory and run `./bin/start-docker.sh` +that will build the docker development environment and when complete will +drop you into a shell on a linux vm with all the tools needed installed. To start with make sure that you have built the java project using `mvn package -DskipTests`. That will allow all tests to spin up a standalone hbase instance from the jar's created. -Then go into the hbase-native-client directory and run `./bin/start-docker.sh` -that will build the docker development environment and when complete will -drop you into a shell on a linux vm with all the tools needed installed. +### Buck -# Buck - -From then on we can use [buck](https://buckbuild.com/) to build everything. +Inside the docker environment we can use [buck](https://buckbuild.com/) to build everything. For example: ``` -buck build //core:core buck test --all -buck build //core:simple-client +buck build //src/hbase/client:simple-client +buck build //src/hbase/client:client ``` That will build the library, then build and test everything, then build the simple-client binary. Buck will find all modules used, and compile them in parallel, caching the results. Output from buck is in the buck-out -foulder. Generated binaries are in buck-out/gen logs are in buck-out/logs - +foulder. Generated binaries are in `buck-out/gen` logs are in `buck-out/logs`. -# Make -If learning buck isn't your thing there is a Makefile wrapper for your -convenience. +## Make +Make is the official build for the client. Standard make commands can be used +to build the project after all of the dependencies are installed. ``` make help make check make clean make all -make build +make install ``` + +Make outputs the artifacts under `build/` directory. Both debug and release builds will be avaialble. `make install` command installs the library under `/usr/local/lib` and header files under `/usr/local/include`. \ No newline at end of file diff --git hbase-native-client/README.md hbase-native-client/README.md index 76df596..e0bc2e4 100644 --- hbase-native-client/README.md +++ hbase-native-client/README.md @@ -19,42 +19,97 @@ under the License. # hbase-native-client -Native client for HBase +This is a C++ library that implements an HBase client. The main artifact for this code base from the build is `libHBaseClient.{so|a}` which can be linked against a client application to execute HBase cluster reads or writes. The library `libHBaseClient.{so|a}` does NOT depend on Java or JNI and does not need -ljvm. Only unit tests, which are not part of the artifact depends on JNI (see below). -This is a C/C++ library that implements a -HBase client. +## Background +HBase C++ Client has been developed over a long term with contributions from many people, in the issue [HBASE-14850](https://issues.apache.org/jira/browse/HBASE-14850). For more information about background, please refer to the jira issue. +## Architecture +The client implements the binary protocol for the RPC which uses Protobuf encoded messages as a serde mechanism as well as custom-encoded data for Cells using KeyValueCodec. -## Design Philosphy +The overall architecture for the client mimics the new async client in HBase-2.0 with the `AsyncConnection` / `AsyncTable` interface. Java async HBase client implements async RPC layer using Netty, and on top of that implements retryable RPCs and higher-level structures using `AsyncRpcRetryingCaller` and `RawAsyncTable` classes. The java side uses Java-8 `CompletableFuture`'s to chain business logic, exception handling and returning to the client application. The best approach to learn about the overall architecture for the C++ Client might be to read the code for the Java async client, if one is more familiar to the Java code base. -Synchronous and Async versions will both be built -on the same foundation. The core foundation will -be C++. External users wanting a C library will -have to choose either async or sync. These -libraries will be thin veneers ontop of the C++. -We should try and follow pthreads example as much -as possible: +The following table summarizes, at the code level detail, how the implementations for the Java async client and C++ Client compare. -* Consistent naming. -* Opaque pointers as types so that binary compat is easy. -* Simple setup when the defaults are good. -* Attr structs when lots of paramters could be needed. +| Layer |Java Async | C++ | +|------------- | --------- | ----| +| low level async socket | netty | wangle | +| thread pools, futures, buffers, etc | netty thread pools, futures and bufs and Java 8 futures | folly Futures, IOBuf, wangle thread pools | +| tcp connection management/pooling | AsyncRpcClient | connection-pool.cc, rpc-client.cc | +| Rpc request / response | (netty-based) AsyncRpcChannel, AsyncServerResponseHandler | (wangle-based) pipeline.cc, client-handler.cc | +| Rpc interface | PB-generated service stubs, HBaseRpcController | PB-generated stubs, rpc-controller.cc (and wangle-based request.cc, service.cc) | +| Request,response conversion (Get -> GetRequest) | RequestConverter | request-converter.cc, response-converter.cc | +| Rpc retry, timeout, exception handling | RawAsyncTableImpl, AsyncRpcRetyingCaller, XXRequestCaller | async-rpc-retrying-caller.cc, async-rpc-retrying-caller-factory | +| meta lookup | ZKAsyncRegistry, curator | location-cache.cc, zk C client| +| meta cache | MetaCache | location-cache.cc | +| Async Client interface (exposed) | AsyncConnection, AsyncTable | | +| Sync client implementation over async interfaces | | table.cc | +| Sync Client Interface (exposed) | ConnectionFactory, Connection, Table, Configuration, etc | client.h, table.h, configuration.h | +| Operations API | Get, Put, Scan, Result, Cell | Get, Put, Scan, Cell| -## Naming -All public C files will start with hbase_*.{h, cc}. This -is to keep naming conflicts to a minimum. Anything without -the hbase_ prefix is assumed to be implementation private. -All C apis and typedefs will be prefixed with hb_. -All typedefs end with _t. +In the C++ client, we use the [Facebook folly](https://github.com/facebook/folly/) library for representing `Promise`'s, `Future`'s and for some other common tasks. We use [Facebook wangle](https://github.com/facebook/wangle/) as a Netty implementation in C++. The RPC framework is impelemented in `connection` module and uses async sockets and is fully asynchrounous returning `Future`s for outgoing RPCs. There are three thread pools, IO, CPU and retry thread pools that are used either for IO or general execution. By default the IO and CPU thread pools will have num_cpus, 2*num_cpus threads. Socket and connection management and pooling is implemented in `connection-pool.cc` and `rpc-client.cc` files. There is a wangle pipeline of request handlers which tracks the outgoing RPCs and responses as well as serializes the RPCs to the wire. On top of the low level RPC mechanism, there is a retrying layer that can retry the RPCs based on the raised exception responses (for example region may have moved, etc). This retrying layer is also fully async, and implemented in `async-rpc-retrying-caller*` files. `exceptions.h` recognizes Java-level exceptions and knows which exceptions we do not need to retry (the logic is fragile), all other exceptions are retried. On top of the retrying layer lies the `raw-async-table` interface which know about the higher-level request objects (Get, Put, etc) and does the conversion between the lower layer RPC (which does not know about Get/Put, but PB objects). The highest layer sitting on top of `raw-async-table` and `async-connection` is the client-API layer, which is the supported C++ Client API. This layer consists of `client.h`, `configuration.h`, `table.h`, `get.h`, `put.h`, `scan.h`, etc which intentionally, follows the Java synchronous API of `Connection/Table/Get/Put`, etc. Again, this layer is fully synchronous, but uses the async raw-async-table as a mechanism. +## Source code +C++ client source code is in `hbase-native-client/src` and `hbase-native-client/include` directories. Include directory contains the header files that needs to be available to the client application at compile time. Only headers needed to be exposed should be kept in this directory. Src folder contains all of the source code and private header files. Src is further organized into different modules that are layered. `connection` module contains the TCP connection management, connection pooling, and all RPC related functionality. `client` module contains the main API for the applications, as well as implementation for the higher level client application logic. `if` module contains `.proto` files copied from the original location (hbase-protocol). The protobuf files in `if` module gets compiled to `.cc` and `.h` files by the `protoc` compiler. `serde` module contains serialization/deserialization related logic, while `test-util` contains code only relevant for unit tests. `test-util` does not get compiled and linked in libHBaseClient artifacts to isolate the JNI dependency. + + +## API +C++ Client API is fully synchronous, although, the implementation is based on async framework. There is no async API supported as of now (though can be developed later easily). The API is very similar to the well-known Java API of `Connection`, `Table`, `Get`, `Put`, `Scan`, `Increment`, `Configuration`, etc. Instead of `Connection` class in Java, C++ defines `Client` class. From `Client`, one can obtain a table via `Client::Table()` method, and from this `Table` object, one can do Gets, Puts and Scans via `Table::Get()`, `Table::Put()` and `Table::Scan()` methods respectively. Client object keeps the connection to the whole cluster, so it should not be destructed while requests are issued. `Table` is a light-weight object which is not thread safe, so the recommended usage is to construct one per-thread. + + +Only the client API corresponding to reading and writing data and scanning is supported at this point. Admin functions, DDL operations like CreateTable, DeleteTable, etc are NOT implemented. + +## Using +HBase C++ Client when build with the makefile generates libraries `libHBaseClient.{so|a}` which can be linked against an application together with the header files. simple-client.cc and load-client.cc files in the source code provides the best example usage from an application perspective. Plus, you can consult an [example project](https://github.com/enis/hbase-native-client-example) to booststrap your code. The example Makefile provides details for how to link against the shared libraries. + +## Configuration +C++ client intentionally uses the same `hbase-site.xml` mechanism from the Java world as a way to control the client behavior. However, one difference from the Java side is that hbase-site.xml is not the only mechanism to configure the client. There are three different ways to obtain a `Configuration` object needed for the client. + + * First, `HBaseConfigurationLoader` can be used to create a configuration object from a given configuration path (containing hbase-site.xml) + * Second, an empty `Configuration` object can be constructed and manually configured via `::SetXXX()` methods + * Or third, a custom configuration loader class can be written to construct and populate the Configuration from desired file formats, for example from `.properties` files, etc. + + +C++ client configuration, by design, uses the same configuration properties whenever it can, with the regular Java client. Some of the common properties include ZooKeeper-related configs +* `hbase.zookeeper.quorum` +* `zookeeper.znode.parent` +* `zookeeper.session.timeout` + +Rpc-related configs +* `hbase.client.retries.number` +* `hbase.rpc.timeout` +* `hbase.client.pause` +* `hbase.client.operation.timeout` + +and scanner-related configs +* `hbase.client.scanner.timeout.period` +* `hbase.client.scanner.caching`. + +Some other configs unique to the C++ client include +* `hbase.client.io.thread.pool.size` +* `hbase.client.cpu.thread.pool.size` + +which defaults to num_cpus and 2*num_cpus respectively. Full list of configuration options supported by the client can be found from the .h files. + + +## Write protocol, compatiblity +HBase C++ Client implements the binary RPC protocol, as well as understands the relevant ZooKeeper and `hbase:meta` table the data structures to implement the full client. It is compatible with 2.0+ servers and is build with Protobuf version 2.7. The client operates similar to the Java client in terms of request execution flows. Client has a ZK connection, and does a ZK request to learn about hbase:meta location, then does a scan request to the hbase:meta server to learn about the actual region location. + +## Building +For information about how to build the client, please check BUILDING.md. + +## Testing +HBase C++ Client implements unit tests using gtest framework. All test files have a name ending with `-test.cc`. Some of the unit test are pure unit tests, while some others use JNI layer to launch a mini HBase cluster. Mini HBase cluster is an in-process cluster where masters and regionservers are started in different threads and commonly used for unit tests in Java. The module `test-util` contains the code to start the cluster and do DDL operations using the Java Admin client over JNI. Then the C++ tests, can execute reads and writes to test the pure-C++ client againts the actual servers. + + +There are also a couple of integration test clients which can be used against a real cluster. simple-client is an executable, built from simple-client.cc. This is a single-threaded program to exeucute requests against the cluster. load-client on the other hand, is a multi-threaded executable which does the same. You can invoke these with `-help` to learn about the command line parameters. Moreoever, load-client can be used with `ChaosMonkey` to execute requests in a cluster while the chaos monkey kills servers and does other disruptive actions (like flush, compact, etc). Please refer to [chaos monkey documentation](https://hbase.apache.org/book.html#maven.build.commands.integration.tests.destructive) for more info. + ## Docker -The build environment is docker. This should keep a consistent -build environment for everyone. Buck the build system works -best with mmap'd files. On OSX this means that vmwarefusion -works the best. However it should work with just the defaults. +The build environment is docker. This should keep a consistent build environment for everyone. Buck the build system works best with mmap'd files. On OSX this means that vmwarefusionworks the best. However it should work with just the defaults. + diff --git hbase-native-client/dev-support/install-hbase-deps-common.sh hbase-native-client/dev-support/install-hbase-deps-common.sh new file mode 100755 index 0000000..0a7428a --- /dev/null +++ hbase-native-client/dev-support/install-hbase-deps-common.sh @@ -0,0 +1,145 @@ +#! /usr/bin/env bash +# +#/** +# * Licensed to the Apache Software Foundation (ASF) under one +# * or more contributor license agreements. See the NOTICE file +# * distributed with this work for additional information +# * regarding copyright ownership. The ASF licenses this file +# * to you under the Apache License, Version 2.0 (the +# * "License"); you may not use this file except in compliance +# * with the License. You may obtain a copy of the License at +# * +# * http://www.apache.org/licenses/LICENSE-2.0 +# * +# * Unless required by applicable law or agreed to in writing, software +# * distributed under the License is distributed on an "AS IS" BASIS, +# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# * See the License for the specific language governing permissions and +# * limitations under the License. +# */ + +# This script will build and install some common libraries for the HBase +# client. + +bin=`dirname "$0"` +bin=`cd "$bin">/dev/null; pwd` + +# Pick a location for these libs +thirdparty=$bin/.. + +LOCAL_LIB=/usr/local/lib + +# We depend on boost 1.55+ for boost/predef +install_boost() { + ver=1_64_0 && \ + v=1.64.0 && \ + cd $thirdparty && \ + wget https://dl.bintray.com/boostorg/release/$v/source/boost_$ver.tar.gz && \ + tar zxf boost_$ver.tar.gz && \ + rm -rf boost_$ver.tar.gz && \ + cd boost_$ver && \ + ./bootstrap.sh && \ + sudo ./b2 install +} + +install_protobuf() { + ver=2.7.0 && \ + cd $thirdparty && \ + git clone https://github.com/google/protobuf.git && \ + cd protobuf && \ + git checkout $ver && \ + mkdir gmock && \ + sudo ldconfig && \ + ./autogen.sh && \ + ./configure && \ + make && \ + sudo make install && \ + make clean && \ + rm -rf .git +} + +install_zookeeper() { + ver=3.4.8 && \ + cd $thirdparty && \ + wget http://www-us.apache.org/dist/zookeeper/zookeeper-$ver/zookeeper-$ver.tar.gz && \ + tar zxf zookeeper-$ver.tar.gz && \ + rm -rf zookeeper-$ver.tar.gz && \ + cd zookeeper-$ver && \ + cd src/c && \ + sudo ldconfig && \ + ./configure && \ + make && \ + sudo make install && \ + make clean && \ + sudo ldconfig +} + +install_folly() { + ver=2017.09.04.00 && \ + cd $thirdparty && \ + wget https://github.com/facebook/folly/archive/v$ver.tar.gz && \ + tar zxf v$ver.tar.gz && \ + rm -rf v$ver.tar.gz && \ + cd folly-$ver/folly && \ + sudo ldconfig && \ + autoreconf -ivf && \ + ./configure && \ + LD_LIBRARY_PATH=/usr/local/lib make && \ + sudo make install +} + +install_wangle() { + ver=2017.09.04.00 && \ + cd $thirdparty && \ + wget https://github.com/facebook/wangle/archive/v$ver.tar.gz && \ + tar zxf v$ver.tar.gz && \ + rm -rf v$ver.tar.gz && \ + cd wangle-$ver/wangle && \ + sudo ldconfig && \ + cmake . -DBUILD_TESTS=OFF && \ + make && \ + ctest && \ + sudo make install +} + +mkdir $thirdparty + +# Boost +if [ -e "${LOCAL_LIB}/libboost.a" ]; then + echo "Found libboost under ${LOCAL_LIB}, skipping building boost" +else + echo "Installing boost" + install_boost +fi + +# Protobuf +if [ -e "${LOCAL_LIB}/libprotobuf.a" ]; then + echo "Found libprotobuf under ${LOCAL_LIB}, skipping building protobuf" +else + echo "Installing protobuf" + install_protobuf +fi + +# Zookeeper +if [ -e "${LOCAL_LIB}/libzookeeper_mt.a" ]; then + echo "Found libzookeeper_mt under ${LOCAL_LIB}, skipping building zookeeper" +else + echo "Installing zookeeper" + install_zookeeper +fi + +# Folly +if [ -e "${LOCAL_LIB}/libfolly.a" ]; then + echo "Found libfolly under ${LOCAL_LIB}, skipping building folly" +else + echo "Installing folly" + install_folly +fi + +# Wangle +if [ -e "${LOCAL_LIB}/libwangle.a" ]; then + echo "Found libwangle under ${LOCAL_LIB}, skipping building wangle" +else + echo "Installing wangle" + install_wangle +fi