Details

    • Type: New Feature New Feature
    • Status: Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.20.6
    • Fix Version/s: None
    • Component/s: Client
    • Labels:
      None
    • Hadoop Flags:
      Incompatible change

      Description

      If via HBASE-794 first class support for talking via Thrift directly to HMaster and HRS is available, then pure C and C++ client libraries are possible.

      The C client library would wrap a Thrift core.

      The C++ client library can provide a class hierarchy quite close to o.a.h.h.client and, ideally, identical semantics. It should be just a wrapper around the C API, for economy.

      Internally to my employer there is a lot of resistance to HBase because many dev teams have a strong C/C++ bias. The real issue however is really client side integration, not a fundamental objection. (What runs server side and how it is managed is a secondary consideration.)

        Issue Links

          Activity

          Hide
          Andrew Purtell added a comment -

          Understood that there's no reason a C/C++ shop can't use Thrift directly, but rightly or wrongly the "extra work" is objectionable.

          Show
          Andrew Purtell added a comment - Understood that there's no reason a C/C++ shop can't use Thrift directly, but rightly or wrongly the "extra work" is objectionable.
          Hide
          Andrew Purtell added a comment - - edited

          Could also look at Google's "protocol buffers" for a native binary protocol: http://code.google.com/p/protobuf/
          http://code.google.com/p/protobuf-rpc/

          Show
          Andrew Purtell added a comment - - edited Could also look at Google's "protocol buffers" for a native binary protocol: http://code.google.com/p/protobuf/ http://code.google.com/p/protobuf-rpc/
          Hide
          Jonathan Gray added a comment -
          Show
          Jonathan Gray added a comment - Mentioned by Wes Chow on the list: http://svn.osafoundation.org/pylucene/trunk/jcc/jcc/README
          Hide
          Bryan Duxbury added a comment -

          Thrift RPC has come a long way - there's a much better server available, and I'm working on a much more compact protocol (THRIFT-110) that would keep wire size down. It might be a mature enough project for you guys to take a look.

          Show
          Bryan Duxbury added a comment - Thrift RPC has come a long way - there's a much better server available, and I'm working on a much more compact protocol ( THRIFT-110 ) that would keep wire size down. It might be a mature enough project for you guys to take a look.
          Hide
          Edward J. Yoon added a comment -

          +1

          Show
          Edward J. Yoon added a comment - +1
          Hide
          Andrew Purtell added a comment - - edited

          First step: See if compiling o.a.h.h.client and all supporting classes with gcj into a SO is viable.

          Show
          Andrew Purtell added a comment - - edited First step: See if compiling o.a.h.h.client and all supporting classes with gcj into a SO is viable.
          Hide
          stack added a comment -

          Just by way of FYI, here is how the C++ interface to HDFS is done: "HDFS provides a C++ library called libhdfs that mirrors the Java interface. In fact, it works using the Java Native Interface (JNI) to call a Java HDFS client. Hadoop comes with pre-built libhdfs binaries for 32-bit Linux, but for other platforms you will need to build them yourself using the instructions at http://wiki.apache.org/hadoop/LibHDFS."

          Show
          stack added a comment - Just by way of FYI, here is how the C++ interface to HDFS is done: "HDFS provides a C++ library called libhdfs that mirrors the Java interface. In fact, it works using the Java Native Interface (JNI) to call a Java HDFS client. Hadoop comes with pre-built libhdfs binaries for 32-bit Linux, but for other platforms you will need to build them yourself using the instructions at http://wiki.apache.org/hadoop/LibHDFS ."
          Hide
          Andrew Purtell added a comment -

          After discussion at the LA Hackathon, it is resolved that a fat C based client API and a fat Java based one would be co-supported. Initially the C client would be marked experimental, until the frequency of changes to the Java client API drops to a maintenance level only.

          Show
          Andrew Purtell added a comment - After discussion at the LA Hackathon, it is resolved that a fat C based client API and a fat Java based one would be co-supported. Initially the C client would be marked experimental, until the frequency of changes to the Java client API drops to a maintenance level only.
          Hide
          stack added a comment -

          Andrew, here's another serialization package in case you hadn't seen the post from Doug:

          I propose we add a new Hadoop subproject for Avro, a serialization system.  My ambition is for Avro to replace both Hadoop's RPC and to be used for most Hadoop data files, e.g., by Pig, Hive, etc.
          
          Initial committers would be Sharad Agarwal and me, both existing Hadoop committers.  We are the sole authors of this software to date.
          
          The code is currently at:
          
          http://people.apache.org/~cutting/avro.git/
          
          To learn more:
          
          git clone http://people.apache.org/~cutting/avro.git/ avro
          cat avro/README.txt
          
          Show
          stack added a comment - Andrew, here's another serialization package in case you hadn't seen the post from Doug: I propose we add a new Hadoop subproject for Avro, a serialization system. My ambition is for Avro to replace both Hadoop's RPC and to be used for most Hadoop data files, e.g., by Pig, Hive, etc. Initial committers would be Sharad Agarwal and me, both existing Hadoop committers. We are the sole authors of this software to date. The code is currently at: http: //people.apache.org/~cutting/avro.git/ To learn more: git clone http: //people.apache.org/~cutting/avro.git/ avro cat avro/README.txt
          Hide
          Andrew Purtell added a comment -

          Thanks Stack. I didn't see the post. Presume it came across core-dev@, which is too high volume for me to follow.

          Show
          Andrew Purtell added a comment - Thanks Stack. I didn't see the post. Presume it came across core-dev@, which is too high volume for me to follow.
          Hide
          stack added a comment -

          Shall we move these out of 0.20.0 Andrew? You think they'll be done in next week or two?

          Show
          stack added a comment - Shall we move these out of 0.20.0 Andrew? You think they'll be done in next week or two?
          Hide
          Andrew Purtell added a comment -

          This is already moved to 0.21. ?

          Show
          Andrew Purtell added a comment - This is already moved to 0.21. ?
          Hide
          Alex Newman added a comment -

          I am not exactly sure I understand this issue. We had similar worries as our shop is <bold>very</bold> c++ biased, and we went with the thrift client. We now solely write c++ based code and tbh hitting a thrift server local to the data is faster than falling back to the rpc mechanism anyway. Would it be enough to write an efficient c++ based thrift server? I would love to see thrift api be the focus of api development as their are still numerous features which haven't been moved out of the java api. Anyway, just my two cents, I will totally help out with any c++ api.

          Show
          Alex Newman added a comment - I am not exactly sure I understand this issue. We had similar worries as our shop is <bold>very</bold> c++ biased, and we went with the thrift client. We now solely write c++ based code and tbh hitting a thrift server local to the data is faster than falling back to the rpc mechanism anyway. Would it be enough to write an efficient c++ based thrift server? I would love to see thrift api be the focus of api development as their are still numerous features which haven't been moved out of the java api. Anyway, just my two cents, I will totally help out with any c++ api.
          Hide
          Andrew Purtell added a comment -

          Unassigning this issue. This will be a big deal for 0.21 and a group effort.

          Show
          Andrew Purtell added a comment - Unassigning this issue. This will be a big deal for 0.21 and a group effort.
          Hide
          Andrew Purtell added a comment -

          Hosting the region assignment table in ZK will simplify the implementation of a C/C++ client. We can use the ZK C API to look up region locations independent of the master so would only have to talk with regionservers. Can start then as an async RPC engine mediating client requests to region servers and not much more (low level C API). Incrementally add smarts from there (higher level C++ API).

          Show
          Andrew Purtell added a comment - Hosting the region assignment table in ZK will simplify the implementation of a C/C++ client. We can use the ZK C API to look up region locations independent of the master so would only have to talk with regionservers. Can start then as an async RPC engine mediating client requests to region servers and not much more (low level C API). Incrementally add smarts from there (higher level C++ API).
          Hide
          Andrew Purtell added a comment -

          Push to 0.22 timeframe and look at the state of Avro's C/C++ bindings then.

          Show
          Andrew Purtell added a comment - Push to 0.22 timeframe and look at the state of Avro's C/C++ bindings then.
          Hide
          Lars Francke added a comment -

          I have looked into Avro quite a bit the last weeks so I was thinking that I could probably easily provide an Avro interface alongside the Thrift interface.

          What I don't quite understand how this issue fits in all that. Thrift and Avro can be used with C/C++ but after reading this I have the feeling you mean something else than just a Thrift-like client interface. If those turn out to be separate things I'll open a new issue and discuss it there further.

          Show
          Lars Francke added a comment - I have looked into Avro quite a bit the last weeks so I was thinking that I could probably easily provide an Avro interface alongside the Thrift interface. What I don't quite understand how this issue fits in all that. Thrift and Avro can be used with C/C++ but after reading this I have the feeling you mean something else than just a Thrift-like client interface. If those turn out to be separate things I'll open a new issue and discuss it there further.
          Hide
          Andrew Purtell added a comment -

          The intent of this issue is to build a fat client in C, wrap in C+, and have it talk directly to the master and regionservers without any gateway/connector process as intermediary. The C+ wrapper would have similar class structure and API as o.a.h.h.client. No need for any Java except on the servers. No intermediary to be a potential bottleneck.

          The notion has a reasonable argument but it's a lot of work. The rationale for taking it on has become less convincing over time as the Thrift and REST connectors have been satisfying enough for users. There was a fair amount of interest in the 0.19 days but that has waned as far as I can see.

          Show
          Andrew Purtell added a comment - The intent of this issue is to build a fat client in C, wrap in C+ , and have it talk directly to the master and regionservers without any gateway/connector process as intermediary. The C + wrapper would have similar class structure and API as o.a.h.h.client. No need for any Java except on the servers. No intermediary to be a potential bottleneck. The notion has a reasonable argument but it's a lot of work. The rationale for taking it on has become less convincing over time as the Thrift and REST connectors have been satisfying enough for users. There was a fair amount of interest in the 0.19 days but that has waned as far as I can see.
          Hide
          stack added a comment -

          @skyhyc Did you put a patch up? I don't see it.

          Show
          stack added a comment - @skyhyc Did you put a patch up? I don't see it.
          Hide
          skyhyc added a comment -

          How to use C++ with Hbase?Anybody who would like to give a example?

          Show
          skyhyc added a comment - How to use C++ with Hbase?Anybody who would like to give a example?
          Hide
          stack added a comment -

          @skyhyc Have you tried thrift?

          Show
          stack added a comment - @skyhyc Have you tried thrift?
          Hide
          skyhyc added a comment -

          No, I just know Hbase provide API for Thrift, but don't know how to use it.

          Show
          skyhyc added a comment - No, I just know Hbase provide API for Thrift, but don't know how to use it.
          Hide
          Abhimanyu added a comment -

          I just started using Thrift and HBase. Now this seems like a typical question but can't seem to find an answer. Our "number-crunching" code is all C++. I am planning on using thrift to load data from HBase
          into the client. The problem is that we're talking about a LOT of data. In a typical scenario I spawn 100 processes and each loads up about 10GB of data ~ 1TB. So I'm not sure if thrift will be fast enough. For management tasks we're going to write everything in Java so that is not a problem. The question then is is it better to write custom wrappers in JNI and bypass thrift completely? Purely for performance considerations.

          Also, what seems like the timeline for the C client since like libhdfs my guess is HBase will provide c++ wrappers using JNI from what the discussion here looks like.

          If thrift is the way to go then we are looking at creating a tool that takes an ODBC data source and loads all the data from one table to an HBase table. Again this will be in C+. Only if we find that the overhead of thrift is too much will we shift to java but that would mean double work writing clients for java and c+. Anyway, we could provide this code for the community.

          Show
          Abhimanyu added a comment - I just started using Thrift and HBase. Now this seems like a typical question but can't seem to find an answer. Our "number-crunching" code is all C++. I am planning on using thrift to load data from HBase into the client. The problem is that we're talking about a LOT of data. In a typical scenario I spawn 100 processes and each loads up about 10GB of data ~ 1TB. So I'm not sure if thrift will be fast enough. For management tasks we're going to write everything in Java so that is not a problem. The question then is is it better to write custom wrappers in JNI and bypass thrift completely? Purely for performance considerations. Also, what seems like the timeline for the C client since like libhdfs my guess is HBase will provide c++ wrappers using JNI from what the discussion here looks like. If thrift is the way to go then we are looking at creating a tool that takes an ODBC data source and loads all the data from one table to an HBase table. Again this will be in C+ . Only if we find that the overhead of thrift is too much will we shift to java but that would mean double work writing clients for java and c +. Anyway, we could provide this code for the community.
          Hide
          skyhyc added a comment -

          Up!!!

          Show
          skyhyc added a comment - Up!!!
          Hide
          stack added a comment -

          Moving out of 0.92. Move it back in if you think differently.

          Show
          stack added a comment - Moving out of 0.92. Move it back in if you think differently.
          Hide
          stack added a comment -

          Moving out of 0.92. Move it back in if you think differently.

          Show
          stack added a comment - Moving out of 0.92. Move it back in if you think differently.
          Hide
          Mikhail Bautin added a comment -

          I think this is now resolved. Here is the C++ HBase client written by Chip Turner at Facebook:

          https://github.com/facebook/native-cpp-hbase-client

          Show
          Mikhail Bautin added a comment - I think this is now resolved. Here is the C++ HBase client written by Chip Turner at Facebook: https://github.com/facebook/native-cpp-hbase-client
          Hide
          stack added a comment -

          Mikhail I think this worth an announcement out on the user mailing list? Its great stuff. If you don't want to do it, I will (better if you do it). I added note to refguide and pushed it out.

          Show
          stack added a comment - Mikhail I think this worth an announcement out on the user mailing list? Its great stuff. If you don't want to do it, I will (better if you do it). I added note to refguide and pushed it out.
          Hide
          Mikhail Bautin added a comment -

          Actually Chip should probably announce this feature himself. I will ask him about it.

          Show
          Mikhail Bautin added a comment - Actually Chip should probably announce this feature himself. I will ask him about it.
          Hide
          stack added a comment -

          @Mikhail That'd be best for sure.

          Show
          stack added a comment - @Mikhail That'd be best for sure.
          Hide
          Andrew Purtell added a comment -

          I'd be delighted to resolve this issue (excellent!) but just to be sure: Do we want to hold it open as a vehicle for moving the native-cpp-hbase-client code into the HBase tree proper, or no? If the latter, let's resolve.

          Show
          Andrew Purtell added a comment - I'd be delighted to resolve this issue (excellent!) but just to be sure: Do we want to hold it open as a vehicle for moving the native-cpp-hbase-client code into the HBase tree proper, or no? If the latter, let's resolve.
          Hide
          Andrew Purtell added a comment -

          Mikhail Bautin Perhaps someone more versed in Thrift and its C++ language support in particular could say, but can we plug in Thrift's TSasl

          {Client,Server}

          Transport here for authenticated opens and optional wire encryption?

          Show
          Andrew Purtell added a comment - Mikhail Bautin Perhaps someone more versed in Thrift and its C++ language support in particular could say, but can we plug in Thrift's TSasl {Client,Server} Transport here for authenticated opens and optional wire encryption?
          Hide
          Todd Lipcon added a comment -

          Hey Andrew. Someone here at Cloudera is working on SASL support for the Thrift C++ bindings, I believe – at least the client side – which should be compatible with the Java server. Hopefully we'll post it to THRIFT-1620 in the coming weeks.

          Show
          Todd Lipcon added a comment - Hey Andrew. Someone here at Cloudera is working on SASL support for the Thrift C++ bindings, I believe – at least the client side – which should be compatible with the Java server. Hopefully we'll post it to THRIFT-1620 in the coming weeks.
          Hide
          Nick Dimiduk added a comment -

          What's happened with this ticket? I don't think the thrift core makes sense anymore, considering protobuf. I think an HBase client library implemented in C is a mandatory feature for a database approaching 1.0 release.

          Show
          Nick Dimiduk added a comment - What's happened with this ticket? I don't think the thrift core makes sense anymore, considering protobuf. I think an HBase client library implemented in C is a mandatory feature for a database approaching 1.0 release.
          Hide
          Andrew Purtell added a comment - - edited

          I don't think the thrift core makes sense anymore, considering protobuf.

          I would agree. The embedded thrift servers in the regionservers were an experiment at FB that they've backed away from. THRIFT-1620 is open with no implementation available.

          I think an HBase client library implemented in C is a mandatory feature for a database approaching 1.0 release.

          The PB work is not finished.

          The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client.

          Various discussions about "native client" usually end with the notion of a Grand Unified Client Project: lighter weight async client, perhaps asynchbase itself or in the mold of it, talking PB to the cluster, with a sync API layered on top. It might be straightforward to build a C++ analogue to asynchbase with std::async (don't know enough about C++11 to say for sure). That does not provide an answer for C folks though.

          Show
          Andrew Purtell added a comment - - edited I don't think the thrift core makes sense anymore, considering protobuf. I would agree. The embedded thrift servers in the regionservers were an experiment at FB that they've backed away from. THRIFT-1620 is open with no implementation available. I think an HBase client library implemented in C is a mandatory feature for a database approaching 1.0 release. The PB work is not finished. The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client. Various discussions about "native client" usually end with the notion of a Grand Unified Client Project: lighter weight async client, perhaps asynchbase itself or in the mold of it, talking PB to the cluster, with a sync API layered on top. It might be straightforward to build a C++ analogue to asynchbase with std::async (don't know enough about C++11 to say for sure). That does not provide an answer for C folks though.
          Hide
          Nick Dimiduk added a comment -

          The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client.

          Agreed; I mean the construction of a fully-featured client implementation available via C, not just transport. I've been out of C/C++ for a number of years, I'm entirely ignorant on C+11 so I cannot comment on implementation details. I do know that it's fairly common-place to wrap a C+ library with C bindings, so that decision can be left up to the implementor.

          Show
          Nick Dimiduk added a comment - The scope of building a C client is not just the transport, it's also duplicating or replacing all of the functionality of the fat Java client. Agreed; I mean the construction of a fully-featured client implementation available via C, not just transport. I've been out of C/C++ for a number of years, I'm entirely ignorant on C+ 11 so I cannot comment on implementation details. I do know that it's fairly common-place to wrap a C + library with C bindings, so that decision can be left up to the implementor.
          Hide
          Andrew Purtell added a comment -

          I will throw out there that libhdfs "cheats" by linking to libjvm.so and pulling in the HDFS client bytecode as engine. I presume we don't want this, but it would be a half measure that stands in for something comprehensive.

          Show
          Andrew Purtell added a comment - I will throw out there that libhdfs "cheats" by linking to libjvm.so and pulling in the HDFS client bytecode as engine. I presume we don't want this, but it would be a half measure that stands in for something comprehensive.
          Hide
          Nick Dimiduk added a comment -

          My previous experience with JNI makes me cringe at this prospect. Perhaps a painful baby-step, this would be a baby-step non the less.

          Show
          Nick Dimiduk added a comment - My previous experience with JNI makes me cringe at this prospect. Perhaps a painful baby-step, this would be a baby-step non the less.
          Hide
          Cosmin Lehene added a comment -

          HBASE-9977 suggests a C++ async client and C sync/async wrappers.
          Given that HBase talks protobuf natively. Is a native wrapper around Thrift still a goal?

          Show
          Cosmin Lehene added a comment - HBASE-9977 suggests a C++ async client and C sync/async wrappers. Given that HBase talks protobuf natively. Is a native wrapper around Thrift still a goal?
          Hide
          Ted Dunning added a comment -

          Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot?

          Shouldn't reality be recognized? Shouldn't this be closed as WONT_FIX?

          Show
          Ted Dunning added a comment - Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot? Shouldn't reality be recognized? Shouldn't this be closed as WONT_FIX?
          Hide
          Andrew Purtell added a comment -

          Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot?

          This issue has been superseded by the use of protobuf in RPCs instead of Thrift and the commit of the start of a C/C++ client library, see HBASE-9977. Closing this issue in lieu of something else is fine, but WONTFIX is the incorrect resolution.

          Show
          Andrew Purtell added a comment - Another way to put this is that if nobody cares enough to even put up a patch after 5 years is this issue simply moot? This issue has been superseded by the use of protobuf in RPCs instead of Thrift and the commit of the start of a C/C++ client library, see HBASE-9977 . Closing this issue in lieu of something else is fine, but WONTFIX is the incorrect resolution.

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew Purtell
            • Votes:
              2 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

              • Created:
                Updated:

                Development