Description
The HBase RPC code (org.apache.hadoop.hbase.ipc.*) was originally forked off of Hadoop RPC classes, with some performance tweaks added. Those optimizations have come at a cost in keeping up with Hadoop RPC changes however, both bug fixes and improvements/new features.
In particular, this impacts how we implement security features in HBase (see HBASE-1697 and HBASE-2016). The secure Hadoop implementation (HADOOP-4487) relies heavily on RPC changes to support client authentication via kerberos and securing and mutual authentication of client/server connections via SASL. Making use of the built-in Hadoop RPC classes will gain us these pieces for free in a secure HBase.
So, I'm proposing that we drop the HBase forked version of RPC and convert to direct use of Hadoop RPC, while working to contribute important fixes back upstream to Hadoop core. Based on a review of the HBase RPC changes, the key divergences seem to be:
HBaseClient:
- added use of TCP keepalive (
HBASE-1754) - made connection retries and sleep configurable (
HBASE-1815) - prevent NPE if socket == null due to creation failure (
HBASE-2443)
HBaseRPC:
- mapping of method names <-> codes (removed in
HBASE-2219)
HBaseServer:
- use of TCP keep alives (
HBASE-1754) - OOME in server does not trigger abort (
HBASE-1198)
HbaseObjectWritable:
- allows List<> serialization
- includes it's own class <-> code mapping (
HBASE-328)
Proposed process is:
1. open issues with patches on Hadoop core for important fixes/adjustments from HBase RPC (HBASE-1198, HBASE-1815, HBASE-1754, HBASE-2443, plus a pluggable ObjectWritable implementation in RPC.Invocation to allow use of HbaseObjectWritable).
2. ship a Hadoop version with RPC patches applied – ideally we should avoid another copy-n-paste code fork, subject to ability to isolate changes from impacting Hadoop internal RPC wire formats
3. if all Hadoop core patches are applied we can drop back to a plain vanilla Hadoop version
I realize there are many different opinions on how to proceed with HBase RPC, so I'm hoping this issue will kick off a discussion on what the best approach might be. My own motivation is maximizing re-use of the authentication and connection security work that's already gone into Hadoop core. I'll put together a set of patches around #1 and #2, but obviously we need some consensus around this to move forward. If I'm missing other differences between HBase and Hadoop RPC, please list as well. Discuss!
Attachments
Attachments
Issue Links
- blocks
-
HBASE-2016 [DAC] Authentication
- Closed
- incorporates
-
HBASE-3615 Implement token based DIGEST-MD5 authentication for MapReduce tasks
- Closed
-
HBASE-2425 Crossport HADOOP-1849 rpc fix
- Closed
- is depended upon by
-
HBASE-3025 Coprocessor based simple access control
- Closed