The HBase RPC code (org.apache.hadoop.hbase.ipc.*) was originally forked off of Hadoop RPC classes, with some performance tweaks added. Those optimizations have come at a cost in keeping up with Hadoop RPC changes however, both bug fixes and improvements/new features.
In particular, this impacts how we implement security features in HBase (see HBASE-1697 and
HBASE-2016). The secure Hadoop implementation ( HADOOP-4487) relies heavily on RPC changes to support client authentication via kerberos and securing and mutual authentication of client/server connections via SASL. Making use of the built-in Hadoop RPC classes will gain us these pieces for free in a secure HBase.
So, I'm proposing that we drop the HBase forked version of RPC and convert to direct use of Hadoop RPC, while working to contribute important fixes back upstream to Hadoop core. Based on a review of the HBase RPC changes, the key divergences seem to be:
- added use of TCP keepalive (
- made connection retries and sleep configurable (
- prevent NPE if socket == null due to creation failure (
- mapping of method names <-> codes (removed in
- allows List<> serialization
- includes it's own class <-> code mapping (
Proposed process is:
1. open issues with patches on Hadoop core for important fixes/adjustments from HBase RPC (
HBASE-1198, HBASE-1815, HBASE-1754, HBASE-2443, plus a pluggable ObjectWritable implementation in RPC.Invocation to allow use of HbaseObjectWritable).
2. ship a Hadoop version with RPC patches applied – ideally we should avoid another copy-n-paste code fork, subject to ability to isolate changes from impacting Hadoop internal RPC wire formats
3. if all Hadoop core patches are applied we can drop back to a plain vanilla Hadoop version
I realize there are many different opinions on how to proceed with HBase RPC, so I'm hoping this issue will kick off a discussion on what the best approach might be. My own motivation is maximizing re-use of the authentication and connection security work that's already gone into Hadoop core. I'll put together a set of patches around #1 and #2, but obviously we need some consensus around this to move forward. If I'm missing other differences between HBase and Hadoop RPC, please list as well. Discuss!