I cannot reopen this issue, so I'll just comment.
As suggested by Jonathan in
HIVE-1434, an hive/cassandra bridge may better fit here.
I have finally found the source of Brisk's implementation (https://github.com/riptano/hive). The patch I am submitting here (
CASSANDRA-913-r1199213.patch) is based on their work. So I cannot grant any license here.
What I did on the original source:
- I changed the package names (for some classes, some package access was needed)
- add ASL2 headers for the ASF
- format the code according to cassandra standard
- change some logger from log4j and commons logging to slf4j
- it didn't handle well nulls in hive tables, I have fixed that for the little tests I did.
About the build, it needs hive jars in contrib/hive/lib. I don't know how to better setup this since those jars are not available in the maven repo.
About runtime, I had a lot of trouble due to some conflict between the thrift library used by hive and the one used by cassandra. hive 0.7 is using the 0.5, cassandra the 0.6. Cassandra external table in hive could not be declared due to some NoSuchMethodException.
As far as I understand hive, hive need thrift at job runtime just for handling dynamic column serialization. In my use case I didn't needed it so I did some hack: I remove every org.apache.thrift class from hive-exec.jar. Then it works nicely (for my use case).
There were some tests in the github repo. They are Hive oriented. I'm too lazy to try to make then work in cassandra's source tree.
With Hive 0.8, it will use thrift 0.7 (hopefully backward compatible with 0.6), and hive artifacts will be published on the maven repository (
HIVE-1095). So probably it will be best to wait for easier integration in cassandra ?