Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.7.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Add a cassandra storage handler.

      1. hive-cassandra.2011-02-25.txt
        146 kB
        Edward Capriolo
      2. HIVE-1434-r1182878.patch
        161 kB
        Nicolas Lalevée
      3. hive-1434-5.patch.txt
        126 kB
        Edward Capriolo
      4. hive-1434-4-patch.txt
        148 kB
        Edward Capriolo
      5. hive-1434-3-patch.txt
        128 kB
        Edward Capriolo
      6. hive-1434-2-patch.txt
        119 kB
        Edward Capriolo
      7. hive-1434-2011-03-14.patch.txt
        147 kB
        Edward Capriolo
      8. hive-1434-2011-03-07.patch.txt
        146 kB
        Edward Capriolo
      9. hive-1434-2011-03-07.patch.txt
        146 kB
        Edward Capriolo
      10. hive-1434-2011-02-26.patch.txt
        146 kB
        Edward Capriolo
      11. hive-1434-1.txt
        18 kB
        Edward Capriolo
      12. hive-1434.2011-02-27.diff.txt
        146 kB
        Edward Capriolo
      13. hive.diff
        83 kB
        Edward Capriolo
      14. cass_handler.diff
        137 kB
        Edward Capriolo
      15. cas-handle.tar.gz
        8.92 MB
        Edward Capriolo

        Issue Links

          Activity

          Hide
          Jeremy Hanna added a comment -

          I guess this is the hive version of CASSANDRA-913. I saw hammer in the hall at the hadoop summit and he said there was a hive ticket on this now.

          Show
          Jeremy Hanna added a comment - I guess this is the hive version of CASSANDRA-913 . I saw hammer in the hall at the hadoop summit and he said there was a hive ticket on this now.
          Hide
          Edward Capriolo added a comment -

          Just a start. (To prove that I am doing something with this ticket)

          Show
          Edward Capriolo added a comment - Just a start. (To prove that I am doing something with this ticket)
          Hide
          Edward Capriolo added a comment -

          I actually got pretty far with this simply duplicating the logic in the Hbase Storage handler. Unfortunately I hit a snafu. Cassandra is not using the deprecated mapred., their input format is using mapreduce.. I have seen a few tickets for this, and as far as I know hive is 100% mapred. So to get this done we either have to wait until hive is converted to mapreduce, or I have to make an "old school" mapred based input format for cassandra.

          @John am I wrong? Is there a way to work with mapreduce input formats that I am not understanding?

          Show
          Edward Capriolo added a comment - I actually got pretty far with this simply duplicating the logic in the Hbase Storage handler. Unfortunately I hit a snafu. Cassandra is not using the deprecated mapred. , their input format is using mapreduce. . I have seen a few tickets for this, and as far as I know hive is 100% mapred. So to get this done we either have to wait until hive is converted to mapreduce, or I have to make an "old school" mapred based input format for cassandra. @John am I wrong? Is there a way to work with mapreduce input formats that I am not understanding?
          Hide
          John Sichi added a comment -

          Hey Ed,

          If you take a look at HIVE-1229, Basab has been helping us clean up the API dependencies, and we have been successful in moving some stuff over to mapreduce from mapred. (I had done some of that already in HiveHFileOutputFormat in order to get it to work, e.g. by making up my own TaskAttemptContext instance wrapping a Progressable.) I think you may be able to do the same.

          As a whole, we can't drop the pre-0.20 dependencies from Hive yet, but for the HBase Handler, we made the restriction that it only builds with Hadoop 0.20 and later, so you can do the same for Cassandra.

          Show
          John Sichi added a comment - Hey Ed, If you take a look at HIVE-1229 , Basab has been helping us clean up the API dependencies, and we have been successful in moving some stuff over to mapreduce from mapred. (I had done some of that already in HiveHFileOutputFormat in order to get it to work, e.g. by making up my own TaskAttemptContext instance wrapping a Progressable.) I think you may be able to do the same. As a whole, we can't drop the pre-0.20 dependencies from Hive yet, but for the HBase Handler, we made the restriction that it only builds with Hadoop 0.20 and later, so you can do the same for Cassandra.
          Hide
          Edward Capriolo added a comment -

          This is not a quality patch yet. I am still experimenting with some ideas. Everying is free form and will likely change before the final patch. There are a few junk files (HiveIColumn,etc) which will not be part of the release.
          Thus far:
          CassandraSplit.java
          HiveCassandraTableInputFormat.java
          CassandraSerDe.java
          TestColumnFamilyInputFormat.java
          TestCassandraPut.java
          TestColumnFamilyInputFormat.java

          Are working and can give you an idea of where the code is going.

          Show
          Edward Capriolo added a comment - This is not a quality patch yet. I am still experimenting with some ideas. Everying is free form and will likely change before the final patch. There are a few junk files (HiveIColumn,etc) which will not be part of the release. Thus far: CassandraSplit.java HiveCassandraTableInputFormat.java CassandraSerDe.java TestColumnFamilyInputFormat.java TestCassandraPut.java TestColumnFamilyInputFormat.java Are working and can give you an idea of where the code is going.
          Hide
          Edward Capriolo added a comment -

          Closing in on this one. This patch sets up build environment correctly. Proper test infrastructure. Patch is much cleaner. Still working on Serializing/Deserialing correctly so not very functional. 80% I think.

          Show
          Edward Capriolo added a comment - Closing in on this one. This patch sets up build environment correctly. Proper test infrastructure. Patch is much cleaner. Still working on Serializing/Deserialing correctly so not very functional. 80% I think.
          Hide
          Edward Capriolo added a comment -

          This patch has full read/write functionality. I am going to do another patch later today with xdocs, but do not expect any code changes.

          Show
          Edward Capriolo added a comment - This patch has full read/write functionality. I am going to do another patch later today with xdocs, but do not expect any code changes.
          Hide
          Edward Capriolo added a comment -

          Refactored the code, added xdoc, more extensive testing.

          Show
          Edward Capriolo added a comment - Refactored the code, added xdoc, more extensive testing.
          Hide
          Amr Awadallah added a comment -

          I am out of office on vacation and will be slower than usual in
          responding to emails. If this is urgent then please call my cell phone
          (or send an sms), otherwise I will reply to your email when I get
          back.

          Thanks for your patience,

          – amr

          Show
          Amr Awadallah added a comment - I am out of office on vacation and will be slower than usual in responding to emails. If this is urgent then please call my cell phone (or send an sms), otherwise I will reply to your email when I get back. Thanks for your patience, – amr
          Hide
          Ali added a comment -

          Tested the patch hive-1434-4.patch against hive trunk and cassandra 0.6.3.
          Select query using *, where clause, count, group by works fine.

          Show
          Ali added a comment - Tested the patch hive-1434-4.patch against hive trunk and cassandra 0.6.3. Select query using *, where clause, count , group by works fine.
          Hide
          John Sichi added a comment -

          I'll start taking a closer look at this one...may take me a few days.

          Show
          John Sichi added a comment - I'll start taking a closer look at this one...may take me a few days.
          Hide
          John Sichi added a comment -

          Some points to be resolved.

          • I'd like to avoid checking all of the dependency jars into cassandra-handler/lib. From googling around, it sounds like an official Cassandra maven repo is not going to happen any time soon, and I'm not sure if we can use the unofficial ones. Would it make sense to just do what we've been doing with the Hadoop dependencies, i.e. fetch the tarball via ivy and then unpack it? If so, I can get it added to mirror.facebook.net/facebook/hive-deps.
          • Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them.
          • Need a checkstyle run to bring the code into conformance there.
          • The tests are very skimpy currently; it would be good to add some joins, unions, etc.
          • There are some minor code cleanups needed; I'll create a review board entry and post them there.
          Show
          John Sichi added a comment - Some points to be resolved. I'd like to avoid checking all of the dependency jars into cassandra-handler/lib. From googling around, it sounds like an official Cassandra maven repo is not going to happen any time soon, and I'm not sure if we can use the unofficial ones. Would it make sense to just do what we've been doing with the Hadoop dependencies, i.e. fetch the tarball via ivy and then unpack it? If so, I can get it added to mirror.facebook.net/facebook/hive-deps. Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them. Need a checkstyle run to bring the code into conformance there. The tests are very skimpy currently; it would be good to add some joins, unions, etc. There are some minor code cleanups needed; I'll create a review board entry and post them there.
          Hide
          Edward Capriolo added a comment -

          Maven, I am on the fence about it. We actually do not need all the libs I included. Having them in a tarball sounds good, but making a maven repo for only this purpose seems to be a lot of work.


          Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them.

          If you can specify specific instances then sure. The code may be 99% the same, but that one nuance is going to make the abstractions confusing and useless.

          I await further review.

          Show
          Edward Capriolo added a comment - Maven, I am on the fence about it. We actually do not need all the libs I included. Having them in a tarball sounds good, but making a maven repo for only this purpose seems to be a lot of work. Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them. If you can specify specific instances then sure. The code may be 99% the same, but that one nuance is going to make the abstractions confusing and useless. I await further review.
          Hide
          John Sichi added a comment -

          Regarding the dependencies: if we use the same mechanism as Hadoop, then we don't need a Maven repo. We just point ivy at the tarball location. See target ivy-retrieve-hadoop-source in build-common.xml, and the various ivy.xml files in subdirs. If you can get this working against a standard Apache mirror download, I can start working on getting the files hosted on mirror.facebook.net, which has had better availability in the past.

          For the refactor, let's do it in a followup and also talk with the Hypertable folks to plan it out, since I think they had to copy a lot of code also. I think it will be possible to do it in a way that is useful and understandable since we now have three instances to work from.

          Show
          John Sichi added a comment - Regarding the dependencies: if we use the same mechanism as Hadoop, then we don't need a Maven repo. We just point ivy at the tarball location. See target ivy-retrieve-hadoop-source in build-common.xml, and the various ivy.xml files in subdirs. If you can get this working against a standard Apache mirror download, I can start working on getting the files hosted on mirror.facebook.net, which has had better availability in the past. For the refactor, let's do it in a followup and also talk with the Hypertable folks to plan it out, since I think they had to copy a lot of code also. I think it will be possible to do it in a way that is useful and understandable since we now have three instances to work from.
          Hide
          Basab Maulik added a comment -

          Re: Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them.

          and Re: For the refactor, let's do it in a followup and also talk with the Hypertable folks to plan it out, since I think they had to copy a lot of code also. I think it will be possible to do it in a way that is useful and understandable since we now have three instances to work from.

          Let us refactor as a follow up. It will be good for these pieces to stabilize independently initially.

          Show
          Basab Maulik added a comment - Re: Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them. and Re: For the refactor, let's do it in a followup and also talk with the Hypertable folks to plan it out, since I think they had to copy a lot of code also. I think it will be possible to do it in a way that is useful and understandable since we now have three instances to work from. Let us refactor as a follow up. It will be good for these pieces to stabilize independently initially.
          Hide
          John Sichi added a comment -

          @Ed: to clarify about the tarball; we would just use a standard Cassandra distribution, e.g.

          http://apache.opensourceresources.org/cassandra/0.6.4/apache-cassandra-0.6.4-bin.tar.gz

          Show
          John Sichi added a comment - @Ed: to clarify about the tarball; we would just use a standard Cassandra distribution, e.g. http://apache.opensourceresources.org/cassandra/0.6.4/apache-cassandra-0.6.4-bin.tar.gz
          Hide
          HBase Review Board added a comment -

          Message from: "John Sichi" <jsichi@facebook.com>

          -----------------------------------------------------------
          This is an automatically generated e-mail. To reply, visit:
          http://review.cloudera.org/r/721/
          -----------------------------------------------------------

          Review request for Hive Developers.

          Summary
          -------

          review by JVS

          This addresses bug HIVE-1434.
          http://issues.apache.org/jira/browse/HIVE-1434

          Diffs


          http://svn.apache.org/repos/asf/hadoop/hive/trunk/build-common.xml 981263
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/build.xml 981263
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/build.xml PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/ivy.xml PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/antlr-3.1.3.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/apache-cassandra-0.6.3.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/avro-1.2.0-dev.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/clhm-production.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-cli-1.1.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-codec-1.2.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-collections-3.2.1.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-lang-2.4.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/google-collections-1.0.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/high-scale-lib.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/ivy-2.1.0.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-core-asl-1.4.0.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-mapper-asl-1.4.0.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jline-0.9.94.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/json-simple-1.1.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/libthrift-r917130.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/log4j-1.2.14.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-api-1.5.8.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-log4j12-1.5.8.jar UNKNOWN
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/storage-conf.xml PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraSerDe.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraStorageHandler.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraRowResult.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraSplit.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveCassandraTableInputFormat.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveIColumn.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraCellMap.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraRow.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraColumn.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraPut.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/HiveCassandraOutputFormat.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/udf/GetCassandraColumn.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/udf/SetCassandraColumn.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/org/apache/cassandra/contrib/utils/service/CassandraServiceDataCleaner.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/org/apache/hadoop/hive/cassandra/CassandraQTestUtil.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/org/apache/hadoop/hive/cassandra/CassandraTestSetup.java PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/queries/cassandra_queries.q PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/access.properties PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/log4j-tools.properties PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/log4j.properties PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/passwd.properties PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/storage-conf.xml PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/results/cassandra_queries.q.out PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/templates/TestCassandraCliDriver.vm PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/docs/stylesheets/project.xml 981263
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/docs/xdocs/developer_docs/cassandra_storage_handler.xml PRE-CREATION
          http://svn.apache.org/repos/asf/hadoop/hive/trunk/docs/xdocs/user_manual/cassandra_storage_handler.xml PRE-CREATION

          Diff: http://review.cloudera.org/r/721/diff

          Testing
          -------

          Thanks,

          John

          Show
          HBase Review Board added a comment - Message from: "John Sichi" <jsichi@facebook.com> ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://review.cloudera.org/r/721/ ----------------------------------------------------------- Review request for Hive Developers. Summary ------- review by JVS This addresses bug HIVE-1434 . http://issues.apache.org/jira/browse/HIVE-1434 Diffs http://svn.apache.org/repos/asf/hadoop/hive/trunk/build-common.xml 981263 http://svn.apache.org/repos/asf/hadoop/hive/trunk/build.xml 981263 http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/build.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/ivy.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/antlr-3.1.3.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/apache-cassandra-0.6.3.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/avro-1.2.0-dev.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/clhm-production.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-cli-1.1.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-codec-1.2.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-collections-3.2.1.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/commons-lang-2.4.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/google-collections-1.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/high-scale-lib.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/ivy-2.1.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-core-asl-1.4.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jackson-mapper-asl-1.4.0.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/jline-0.9.94.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/json-simple-1.1.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/libthrift-r917130.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/log4j-1.2.14.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-api-1.5.8.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/slf4j-log4j12-1.5.8.jar UNKNOWN http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/lib/storage-conf.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraSerDe.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/CassandraStorageHandler.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraRowResult.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/CassandraSplit.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveCassandraTableInputFormat.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/HiveIColumn.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraCellMap.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/input/LazyCassandraRow.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraColumn.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/CassandraPut.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/output/HiveCassandraOutputFormat.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/udf/GetCassandraColumn.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/java/org/apache/hadoop/hive/cassandra/udf/SetCassandraColumn.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/org/apache/cassandra/contrib/utils/service/CassandraServiceDataCleaner.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/org/apache/hadoop/hive/cassandra/CassandraQTestUtil.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/org/apache/hadoop/hive/cassandra/CassandraTestSetup.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/queries/cassandra_queries.q PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/access.properties PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/log4j-tools.properties PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/log4j.properties PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/passwd.properties PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/resources/storage-conf.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/results/cassandra_queries.q.out PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/cassandra-handler/src/test/templates/TestCassandraCliDriver.vm PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/docs/stylesheets/project.xml 981263 http://svn.apache.org/repos/asf/hadoop/hive/trunk/docs/xdocs/developer_docs/cassandra_storage_handler.xml PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hive/trunk/docs/xdocs/user_manual/cassandra_storage_handler.xml PRE-CREATION Diff: http://review.cloudera.org/r/721/diff Testing ------- Thanks, John
          Hide
          Jeremy Hanna added a comment -

          Any update to the status of this ticket?

          Show
          Jeremy Hanna added a comment - Any update to the status of this ticket?
          Hide
          Edward Capriolo added a comment -

          Started re basing for 7.0. Also working on using class names to differentiate between ColumnFamily and SuperColumn family support.

          Show
          Edward Capriolo added a comment - Started re basing for 7.0. Also working on using class names to differentiate between ColumnFamily and SuperColumn family support.
          Hide
          Jeremy Hanna added a comment -

          Trying to help get the unit tests working. Also Cassandra is now in maven central, so that may be of use as well.

          Show
          Jeremy Hanna added a comment - Trying to help get the unit tests working. Also Cassandra is now in maven central, so that may be of use as well.
          Hide
          Edward Capriolo added a comment -

          Got a large portion of the handler working, I had a number of magic cookies. During the refactor I managed to break it somehow, but very close to a working final.

          Show
          Edward Capriolo added a comment - Got a large portion of the handler working, I had a number of magic cookies. During the refactor I managed to break it somehow, but very close to a working final.
          Hide
          Edward Capriolo added a comment -

          A working patch! We just need to remove some debug statements. Thanks to Brandon Williams and Jeremy Hanna for debugging. Going to clean up the code ASAP.

          Show
          Edward Capriolo added a comment - A working patch! We just need to remove some debug statements. Thanks to Brandon Williams and Jeremy Hanna for debugging. Going to clean up the code ASAP.
          Hide
          Edward Capriolo added a comment -

          ivy is unable to fetch cassandra jars. I think our private repo's are missing them. Tests work when cassandra jars are in the lib directory.

          Show
          Edward Capriolo added a comment - ivy is unable to fetch cassandra jars. I think our private repo's are missing them. Tests work when cassandra jars are in the lib directory.
          Hide
          Jeremy Hanna added a comment -

          Not being very familiar with the hive build setup, it looks like it's something that just needs to be tweaked in the cassandra-handler's build.xml/ivy.xml to download the cassandra jars. They are in maven central in http://repo1.maven.org/maven2/org/apache/cassandra/apache-cassandra/0.7.2/ so maven groupId="org.apache.cassandra" and maven artifactId="apache-cassandra".

          I tried a few things but like I said, I'm not as familiar with the hive build setup (directories, xml includes, etc). If someone reviewing the patch could advise on that aspect.

          Show
          Jeremy Hanna added a comment - Not being very familiar with the hive build setup, it looks like it's something that just needs to be tweaked in the cassandra-handler's build.xml/ivy.xml to download the cassandra jars. They are in maven central in http://repo1.maven.org/maven2/org/apache/cassandra/apache-cassandra/0.7.2/ so maven groupId="org.apache.cassandra" and maven artifactId="apache-cassandra". I tried a few things but like I said, I'm not as familiar with the hive build setup (directories, xml includes, etc). If someone reviewing the patch could advise on that aspect.
          Hide
          John Sichi added a comment -

          Based on this:

          http://mail-archives.apache.org/mod_mbox/cassandra-client-dev/201101.mbox/%3CAANLkTinnxA=mmb3d5J_hyn_Uq0StTWEWhr2W+Y5HYyA1@mail.gmail.com%3E

          I was able to get it to work by changing the dependency to reference name="cassandra-all" rather than "apache-cassandra".

          I also had to add <exclude module="jug"/> since ivy was failing to pull down some org.safehaus.jug transitive dependency.

          Show
          John Sichi added a comment - Based on this: http://mail-archives.apache.org/mod_mbox/cassandra-client-dev/201101.mbox/%3CAANLkTinnxA=mmb3d5J_hyn_Uq0StTWEWhr2W+Y5HYyA1@mail.gmail.com%3E I was able to get it to work by changing the dependency to reference name="cassandra-all" rather than "apache-cassandra". I also had to add <exclude module="jug"/> since ivy was failing to pull down some org.safehaus.jug transitive dependency.
          Hide
          Edward Capriolo added a comment -

          Now working with ivy maven.

          Show
          Edward Capriolo added a comment - Now working with ivy maven.
          Hide
          Jeremy Hanna added a comment -

          Thanks John - sounds like I should have tried a few more things - yesterday was busy with our baby and I'm not as familiar with ivy as maven + the difference in build setups... but it turned out to be something small - thanks!

          Show
          Jeremy Hanna added a comment - Thanks John - sounds like I should have tried a few more things - yesterday was busy with our baby and I'm not as familiar with ivy as maven + the difference in build setups... but it turned out to be something small - thanks!
          Hide
          John Sichi added a comment -

          After applying patch:

          ant clean package
          ... builds fine ...
          ant test -Dtestcase=TestCassandraCliDriver
          ...
          BUILD FAILED
          /data/users/jsichi/open/hive-trunk/build-common.xml:317: /data/users/jsichi/open/hive-trunk/cassandra-handler/lib does not exist.
          
          Show
          John Sichi added a comment - After applying patch: ant clean package ... builds fine ... ant test -Dtestcase=TestCassandraCliDriver ... BUILD FAILED /data/users/jsichi/open/hive-trunk/build-common.xml:317: /data/users/jsichi/open/hive-trunk/cassandra-handler/lib does not exist.
          Hide
          John Sichi added a comment -

          After manual mkdir cassandra-handler/lib, I tried again and got

          ...
          test:
              [junit] Running org.apache.hadoop.hive.cli.TestCassandraCliDriver
              [junit] SLF4J: Class path contains multiple SLF4J bindings.
              [junit] SLF4J: Found binding in [jar:file:/data/users/jsichi/open/hive-trunk/build/ivy/lib/default/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
              [junit] SLF4J: Found binding in [jar:file:/data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.20.1/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
              [junit] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
              [junit] org.apache.thrift.TApplicationException: Internal error processing system_add_keyspace
              [junit] 	at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
              [junit] 	at org.apache.cassandra.thrift.Cassandra$Client.recv_system_add_keyspace(Cassandra.java:1403)
              [junit] 	at org.apache.cassandra.thrift.Cassandra$Client.system_add_keyspace(Cassandra.java:1386)
              [junit] 	at org.apache.hadoop.hive.cassandra.CassandraTestSetup.preTest(CassandraTestSetup.java:56)
              [junit] 	at org.apache.hadoop.hive.cassandra.CassandraQTestUtil.<init>(CassandraQTestUtil.java:14)
              [junit] 	at org.apache.hadoop.hive.cli.TestCassandraCliDriver.setUp(TestCassandraCliDriver.java:41)
              [junit] 	at junit.framework.TestCase.runBare(TestCase.java:125)
              [junit] 	at junit.framework.TestResult$1.protect(TestResult.java:106)
              [junit] 	at junit.framework.TestResult.runProtected(TestResult.java:124)
              [junit] 	at junit.framework.TestResult.run(TestResult.java:109)
              [junit] 	at junit.framework.TestCase.run(TestCase.java:118)
              [junit] 	at junit.framework.TestSuite.runTest(TestSuite.java:208)
              [junit] 	at junit.framework.TestSuite.run(TestSuite.java:203)
              [junit] 	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
              [junit] 	at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
              [junit] 	at junit.framework.TestResult.runProtected(TestResult.java:124)
              [junit] 	at junit.extensions.TestSetup.run(TestSetup.java:23)
              [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
              [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
              [junit] 	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
              [junit] Exception: Internal error processing system_add_keyspace
              [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.329 sec
              [junit] Test org.apache.hadoop.hive.cli.TestCassandraCliDriver FAILED
                [for] /data/users/jsichi/open/hive-trunk/cassandra-handler/build.xml: The following error occurred while executing this line:
                [for] /data/users/jsichi/open/hive-trunk/build.xml:214: The following error occurred while executing this line:
                [for] /data/users/jsichi/open/hive-trunk/build-common.xml:455: Tests failed!
          ...
          
          Show
          John Sichi added a comment - After manual mkdir cassandra-handler/lib, I tried again and got ... test: [junit] Running org.apache.hadoop.hive.cli.TestCassandraCliDriver [junit] SLF4J: Class path contains multiple SLF4J bindings. [junit] SLF4J: Found binding in [jar:file:/data/users/jsichi/open/hive-trunk/build/ivy/lib/default/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] [junit] SLF4J: Found binding in [jar:file:/data/users/jsichi/open/hive-trunk/build/hadoopcore/hadoop-0.20.1/lib/slf4j-log4j12-1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class] [junit] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. [junit] org.apache.thrift.TApplicationException: Internal error processing system_add_keyspace [junit] at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) [junit] at org.apache.cassandra.thrift.Cassandra$Client.recv_system_add_keyspace(Cassandra.java:1403) [junit] at org.apache.cassandra.thrift.Cassandra$Client.system_add_keyspace(Cassandra.java:1386) [junit] at org.apache.hadoop.hive.cassandra.CassandraTestSetup.preTest(CassandraTestSetup.java:56) [junit] at org.apache.hadoop.hive.cassandra.CassandraQTestUtil.<init>(CassandraQTestUtil.java:14) [junit] at org.apache.hadoop.hive.cli.TestCassandraCliDriver.setUp(TestCassandraCliDriver.java:41) [junit] at junit.framework.TestCase.runBare(TestCase.java:125) [junit] at junit.framework.TestResult$1.protect(TestResult.java:106) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.framework.TestResult.run(TestResult.java:109) [junit] at junit.framework.TestCase.run(TestCase.java:118) [junit] at junit.framework.TestSuite.runTest(TestSuite.java:208) [junit] at junit.framework.TestSuite.run(TestSuite.java:203) [junit] at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) [junit] at junit.extensions.TestSetup$1.protect(TestSetup.java:19) [junit] at junit.framework.TestResult.runProtected(TestResult.java:124) [junit] at junit.extensions.TestSetup.run(TestSetup.java:23) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) [junit] at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) [junit] Exception: Internal error processing system_add_keyspace [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 1.329 sec [junit] Test org.apache.hadoop.hive.cli.TestCassandraCliDriver FAILED [for] /data/users/jsichi/open/hive-trunk/cassandra-handler/build.xml: The following error occurred while executing this line: [for] /data/users/jsichi/open/hive-trunk/build.xml:214: The following error occurred while executing this line: [for] /data/users/jsichi/open/hive-trunk/build-common.xml:455: Tests failed! ...
          Hide
          Edward Capriolo added a comment -

          Fixes the missing lib dir.

          Show
          Edward Capriolo added a comment - Fixes the missing lib dir.
          Hide
          John Sichi added a comment -

          I looked in hive.log after the test failure and noticed the exception below. So maybe we need to tweak ivy some more to include org.safehaus.uuid?

          2011-03-14 12:45:49,370 ERROR thrift.Cassandra$Processor (Cassandra.java:process(3482)) - Internal error processing system_add_keyspace
          java.lang.NoClassDefFoundError: org/safehaus/uuid/UUIDGenerator
                  at org.apache.cassandra.utils.UUIDGen.makeType1UUIDFromHost(UUIDGen.java:48)
          
          Show
          John Sichi added a comment - I looked in hive.log after the test failure and noticed the exception below. So maybe we need to tweak ivy some more to include org.safehaus.uuid? 2011-03-14 12:45:49,370 ERROR thrift.Cassandra$Processor (Cassandra.java:process(3482)) - Internal error processing system_add_keyspace java.lang.NoClassDefFoundError: org/safehaus/uuid/UUIDGenerator at org.apache.cassandra.utils.UUIDGen.makeType1UUIDFromHost(UUIDGen.java:48)
          Hide
          Edward Capriolo added a comment -

          It looks like this is only used when first generating system tables. I excluded JUG in ivy because it was barfing that day. Let me look at this quickly.

          Show
          Edward Capriolo added a comment - It looks like this is only used when first generating system tables. I excluded JUG in ivy because it was barfing that day. Let me look at this quickly.
          Hide
          Edward Capriolo added a comment -

          We can not pull down.

          [ivy:resolve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
          [ivy:resolve] :: loading settings :: file = /home/edward/hive-dev/hive-trunk-cassandra/ivy/ivysettings.xml
          [ivy:resolve] 
          [ivy:resolve] :: problems summary ::
          [ivy:resolve] :::: WARNINGS
          [ivy:resolve] 		[FAILED     ] org.safehaus.jug#jug;2.0.0!jug.jar:  (0ms)
          [ivy:resolve] 	==== hadoop-source: tried
          [ivy:resolve] 	  http://mirror.facebook.net/facebook/hive-deps/hadoop/core/jug-2.0.0/jug-2.0.0.jar
          [ivy:resolve] 	==== apache-snapshot: tried
          [ivy:resolve] 	  https://repository.apache.org/content/repositories/snapshots/org/safehaus/jug/jug/2.0.0/jug-2.0.0-asl.jar
          [ivy:resolve] 	==== maven2: tried
          [ivy:resolve] 	  http://repo1.maven.org/maven2/org/safehaus/jug/jug/2.0.0/jug-2.0.0.jar
          [ivy:resolve] 	==== datanucleus-repo: tried
          [ivy:resolve] 	  http://www.datanucleus.org/downloads/maven2/org/safehaus/jug/jug/2.0.0/jug-2.0.0.jar
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 		::              FAILED DOWNLOADS            ::
          [ivy:resolve] 		:: ^ see resolution messages for details  ^ ::
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 		:: org.safehaus.jug#jug;2.0.0!jug.jar
          [ivy:resolve] 		::::::::::::::::::::::::::::::::::::::::::::::
          [ivy:resolve] 
          

          Do we need to add a repo?

          Show
          Edward Capriolo added a comment - We can not pull down. [ivy:resolve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:resolve] :: loading settings :: file = /home/edward/hive-dev/hive-trunk-cassandra/ivy/ivysettings.xml [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] :::: WARNINGS [ivy:resolve] [FAILED ] org.safehaus.jug#jug;2.0.0!jug.jar: (0ms) [ivy:resolve] ==== hadoop-source: tried [ivy:resolve] http://mirror.facebook.net/facebook/hive-deps/hadoop/core/jug-2.0.0/jug-2.0.0.jar [ivy:resolve] ==== apache-snapshot: tried [ivy:resolve] https://repository.apache.org/content/repositories/snapshots/org/safehaus/jug/jug/2.0.0/jug-2.0.0-asl.jar [ivy:resolve] ==== maven2: tried [ivy:resolve] http://repo1.maven.org/maven2/org/safehaus/jug/jug/2.0.0/jug-2.0.0.jar [ivy:resolve] ==== datanucleus-repo: tried [ivy:resolve] http://www.datanucleus.org/downloads/maven2/org/safehaus/jug/jug/2.0.0/jug-2.0.0.jar [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: FAILED DOWNLOADS :: [ivy:resolve] :: ^ see resolution messages for details ^ :: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: org.safehaus.jug#jug;2.0.0!jug.jar [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] Do we need to add a repo?
          Hide
          Carl Steinbach added a comment -

          I was able to get it to work by modifying ivy/ivysettings.xml as follows:

          -  <property name="maven2.pattern" 
          -   value="[organisation]/[module]/[revision]/[module]-[revision]"/>
          +  <property name="maven2.pattern" 
          +   value="[organisation]/[module]/[revision]/[module]-[revision](-[classifier])"/>
          

          This is related to IVY-418.

          Why is the cassandra dependency listed in the ql/ivy.xml file? Shouldn't this go in cassandra-handler/ivy.xml instead? Also, can you please add an ASF header to cassandra-handler/ivy.xml?

          Show
          Carl Steinbach added a comment - I was able to get it to work by modifying ivy/ivysettings.xml as follows: - <property name= "maven2.pattern" - value= "[organisation]/[module]/[revision]/[module]-[revision]" /> + <property name= "maven2.pattern" + value= "[organisation]/[module]/[revision]/[module]-[revision](-[classifier])" /> This is related to IVY-418 . Why is the cassandra dependency listed in the ql/ivy.xml file? Shouldn't this go in cassandra-handler/ivy.xml instead? Also, can you please add an ASF header to cassandra-handler/ivy.xml?
          Hide
          Edward Capriolo added a comment -

          Adds repo to grab jug

          Show
          Edward Capriolo added a comment - Adds repo to grab jug
          Hide
          Edward Capriolo added a comment -

          I put this in ql/ivy.xml because that is where the hbase stuff is.

          As to jug my latest patch has another approach. Which way do you think is best?

          Show
          Edward Capriolo added a comment - I put this in ql/ivy.xml because that is where the hbase stuff is. As to jug my latest patch has another approach. Which way do you think is best?
          Hide
          John Sichi added a comment -

          The hbase stuff was originally in hbase-handler, but it had to move to ql when we started using zookeeper for locking because the zookeeper mini test cluster is part of HBase test infra.

          For Cassandra, it should be in cassandra-handler.

          Show
          John Sichi added a comment - The hbase stuff was originally in hbase-handler, but it had to move to ql when we started using zookeeper for locking because the zookeeper mini test cluster is part of HBase test infra. For Cassandra, it should be in cassandra-handler.
          Hide
          Edward Capriolo added a comment -

          That makes sense. I was thinking about this some more and someone might want to try doing the locking with C*. Along that thinking I figure why put it one place now just to move it later.

          Show
          Edward Capriolo added a comment - That makes sense. I was thinking about this some more and someone might want to try doing the locking with C*. Along that thinking I figure why put it one place now just to move it later.
          Hide
          John Sichi added a comment -

          If that happened we would want to go the other direction (make the lock manager harness used by the unit test framework pluggable, and pull zk out of ql) rather than dragging more stuff into ql.

          Let's try to keep cassandra-handler as low-impact as possible.

          Show
          John Sichi added a comment - If that happened we would want to go the other direction (make the lock manager harness used by the unit test framework pluggable, and pull zk out of ql) rather than dragging more stuff into ql. Let's try to keep cassandra-handler as low-impact as possible.
          Hide
          Edward Capriolo added a comment -

          Ok so be it moving this is easy. Is this the only comment? (I am not going to regen the patch if more review is pending.

          Show
          Edward Capriolo added a comment - Ok so be it moving this is easy. Is this the only comment? (I am not going to regen the patch if more review is pending.
          Hide
          John Sichi added a comment -

          The test is passing for me now with the latest patch. I haven't looked at the latest code much yet.

          In CassandraTestSetup.java, I see

              FramedConnWrapper wrap = new FramedConnWrapper("127.0.0.1",9170,5000);
          

          Does that mean a listening port is being used? If so, please change it to use a dynamic port like I did for the HBase tests; otherwise we'll get sporadic conflicts with other services.

          Also, what's up with adding org/apache/cassandra/contrib/utils/service/CassandraServiceDataCleaner.java into the Hive codebase? I don't think we want to do that.

          Show
          John Sichi added a comment - The test is passing for me now with the latest patch. I haven't looked at the latest code much yet. In CassandraTestSetup.java, I see FramedConnWrapper wrap = new FramedConnWrapper("127.0.0.1",9170,5000); Does that mean a listening port is being used? If so, please change it to use a dynamic port like I did for the HBase tests; otherwise we'll get sporadic conflicts with other services. Also, what's up with adding org/apache/cassandra/contrib/utils/service/CassandraServiceDataCleaner.java into the Hive codebase? I don't think we want to do that.
          Hide
          Edward Capriolo added a comment -

          CassandraServiceDataCleaner.java is a glorified 'rm -rf' that is in contrib and does not get packaged in maven (I do not think)

          As for dynamic listening ports. This is not as "easy" as it is for hbase. Cassandra reading it's configuration is more of a black box. You can use properties to point at different folders, but when Cassandra initializes the first thing that happens is the configuration file is read.

          AFAIK the only way to can do this is dynamically generate it's yaml file. This is going to be ugly.

          Show
          Edward Capriolo added a comment - CassandraServiceDataCleaner.java is a glorified 'rm -rf' that is in contrib and does not get packaged in maven (I do not think) As for dynamic listening ports. This is not as "easy" as it is for hbase. Cassandra reading it's configuration is more of a black box. You can use properties to point at different folders, but when Cassandra initializes the first thing that happens is the configuration file is read. AFAIK the only way to can do this is dynamically generate it's yaml file. This is going to be ugly.
          Hide
          Jonathan Ellis added a comment -

          It's probably simpler to patch Cassandra to allow specifying ports via properties, than dynamically generate yaml. That's fine if this is a blocker.

          But IMO it's reasonable to expect someone who wants to run Cassandra, to configure the ports the standard Cassandra way in yaml.

          Show
          Jonathan Ellis added a comment - It's probably simpler to patch Cassandra to allow specifying ports via properties, than dynamically generate yaml. That's fine if this is a blocker. But IMO it's reasonable to expect someone who wants to run Cassandra, to configure the ports the standard Cassandra way in yaml.
          Hide
          Jonathan Ellis added a comment -

          But IMO it's reasonable to expect someone who wants to run Cassandra, to configure the ports the standard Cassandra way

          Realized 2s after submitting that we're talking about test code. Back to my first suggestion of patching Cassandra to allow that to be dynamic.

          Show
          Jonathan Ellis added a comment - But IMO it's reasonable to expect someone who wants to run Cassandra, to configure the ports the standard Cassandra way Realized 2s after submitting that we're talking about test code. Back to my first suggestion of patching Cassandra to allow that to be dynamic.
          Hide
          Edward Capriolo added a comment -

          Older versions of the Cassandra embedded server had init() then start(). If we went to a model like that some code could change the loaded configuration after the load.

          I do not things the port is a serious blocker. What one out of every 65K tests will fail with a "port already in use exception"

          Show
          Edward Capriolo added a comment - Older versions of the Cassandra embedded server had init() then start(). If we went to a model like that some code could change the loaded configuration after the load. I do not things the port is a serious blocker. What one out of every 65K tests will fail with a "port already in use exception"
          Hide
          John Sichi added a comment -

          It is a blocker. The HBase problems were part of what caused Hive continuous integration to go broken for many weeks. The failure frequency was very high due to conflicts from unrelated port-hungry services being run on committer dev boxes.

          Show
          John Sichi added a comment - It is a blocker. The HBase problems were part of what caused Hive continuous integration to go broken for many weeks. The failure frequency was very high due to conflicts from unrelated port-hungry services being run on committer dev boxes.
          Hide
          Brandon Williams added a comment -

          Created CASSANDRA-2343 to address this.

          Show
          Brandon Williams added a comment - Created CASSANDRA-2343 to address this.
          Hide
          Edward Capriolo added a comment -

          @Brandon, -> Well done. Love how C* gets stuff through the pipe fast

          If the test setup I will find a free port, or chose a random one and then set it as a prop.\ There could be a failure between the time we check for a port being free and the start up but that is unlikely.

          We should also maybe contract infra-about each project having a port range. I know this can be enforced with something like SELinux.

          Show
          Edward Capriolo added a comment - @Brandon, -> Well done. Love how C* gets stuff through the pipe fast If the test setup I will find a free port, or chose a random one and then set it as a prop.\ There could be a failure between the time we check for a port being free and the start up but that is unlikely. We should also maybe contract infra-about each project having a port range. I know this can be enforced with something like SELinux.
          Hide
          Edward Capriolo added a comment -

          Also just an off the topic type of comment on our unit testing. This end to end testing is great, but we might be better off using mockito for this type of testing. We are writing a lot of code to bring up embedded instances to return known values.

          Show
          Edward Capriolo added a comment - Also just an off the topic type of comment on our unit testing. This end to end testing is great, but we might be better off using mockito for this type of testing. We are writing a lot of code to bring up embedded instances to return known values.
          Hide
          Edward Capriolo added a comment -

          I am not at all happy with this being a blocker. First off let me point out that I seem to ALWAYS get held to a higher standard then everyone else. For example, https://issues.apache.org/jira/browse/HIVE-1335 it was insisted that I use IVY. Yet with https://issues.apache.org/jira/browse/HIVE-1235 obviously no such rule was enforced here.

          As you pointed out:

          The HBase problems were part of what caused Hive continuous integration to go broken for many weeks. The failure frequency was very high due to conflicts from unrelated port-hungry services being run on committer dev boxes.

          So in other words, the hbase-handler was allowed to go +1 and break the hudson/jenkins for weeks, until someone got around to finding a solution, but now this same reason this will go -1 and not get committed.

          Upstream Hadoop !Unable to render embedded object: File (NEVER) not found.! completes a full 'ant test' successfully. While this is not an ideal situation it is a fact.

          Also the one week lag time between patch-available and review just to find one new blocker at a time is getting old. I think I have seen 15 issues get reviewed and committed since this went patch_available.

          Show
          Edward Capriolo added a comment - I am not at all happy with this being a blocker. First off let me point out that I seem to ALWAYS get held to a higher standard then everyone else. For example, https://issues.apache.org/jira/browse/HIVE-1335 it was insisted that I use IVY. Yet with https://issues.apache.org/jira/browse/HIVE-1235 obviously no such rule was enforced here. As you pointed out: The HBase problems were part of what caused Hive continuous integration to go broken for many weeks. The failure frequency was very high due to conflicts from unrelated port-hungry services being run on committer dev boxes. So in other words, the hbase-handler was allowed to go +1 and break the hudson/jenkins for weeks, until someone got around to finding a solution, but now this same reason this will go -1 and not get committed. Upstream Hadoop ! Unable to render embedded object: File (NEVER) not found. ! completes a full 'ant test' successfully. While this is not an ideal situation it is a fact. Also the one week lag time between patch-available and review just to find one new blocker at a time is getting old. I think I have seen 15 issues get reviewed and committed since this went patch_available.
          Hide
          John Sichi added a comment -

          Hey Ed, I'm out on vacation so just saw this). A couple of corrections:

          • When the HBase handler was originally committed, tests were running fine. We hadn't yet realized the dynamic ports problem because the test machines used by committers didn't have a lot of random ports open. Only recently, those machines (at Facebook) started getting some service changes which caused the port problem to show up. So the problem wasn't actually the HBase handler; it was that people started seeing test failures and then committing anyway because they assumed it was just the HBase test flaking. Once we finally tracked it down, we fixed the dynamic ports problem. Now that we're aware of the problem, it would be a bad idea to repeat it.
          • Regarding ivy and HIVE-1235: when the HBase Handler was committed, HBase and its dependencies weren't yet available in ivy. We got that kicked off and then started using it once available.

          Keep up the good work with this one; we'll get it in.

          Show
          John Sichi added a comment - Hey Ed, I'm out on vacation so just saw this). A couple of corrections: When the HBase handler was originally committed, tests were running fine. We hadn't yet realized the dynamic ports problem because the test machines used by committers didn't have a lot of random ports open. Only recently, those machines (at Facebook) started getting some service changes which caused the port problem to show up. So the problem wasn't actually the HBase handler; it was that people started seeing test failures and then committing anyway because they assumed it was just the HBase test flaking. Once we finally tracked it down, we fixed the dynamic ports problem. Now that we're aware of the problem, it would be a bad idea to repeat it. Regarding ivy and HIVE-1235 : when the HBase Handler was committed, HBase and its dependencies weren't yet available in ivy. We got that kicked off and then started using it once available. Keep up the good work with this one; we'll get it in.
          Hide
          Amr Awadallah added a comment -

          I am out of office on a business trip this week and will be slower
          than usual in responding to emails. If this is urgent then please call
          my cell phone (or send an SMS), otherwise I will reply to your email
          when I get back.

          Thanks for your patience,

          – amr

          Show
          Amr Awadallah added a comment - I am out of office on a business trip this week and will be slower than usual in responding to emails. If this is urgent then please call my cell phone (or send an SMS), otherwise I will reply to your email when I get back. Thanks for your patience, – amr
          Hide
          John Sichi added a comment -

          Some other things which need to be addressed:

          • Apache headers are missing on many new files
          • all commented-out code should be removed
          • new classes (e.g. CassandraStorageHandler) should have Javadoc (and for ones that have it, like CassandraQTestUtil, eliminate copy-and-paste evidence)
          • there is a file in the patch with the name cassandra-handler/src/test/results/cassandra_queries; I don't think it's supposed to be there (there should only be the .q.out file)

          For the HBase handler, there's a wiki page; it would be good to have one here too.

          Also, for HBase, we originally had some bugs with joins against tables with different schemas (and for joining HBase vs non-HBase tables), so you probably want to add some tests for those similar to the ones in hbase_queries.q and hbase_joins.q.

          Show
          John Sichi added a comment - Some other things which need to be addressed: Apache headers are missing on many new files all commented-out code should be removed new classes (e.g. CassandraStorageHandler) should have Javadoc (and for ones that have it, like CassandraQTestUtil, eliminate copy-and-paste evidence) there is a file in the patch with the name cassandra-handler/src/test/results/cassandra_queries; I don't think it's supposed to be there (there should only be the .q.out file) For the HBase handler, there's a wiki page; it would be good to have one here too. Also, for HBase, we originally had some bugs with joins against tables with different schemas (and for joining HBase vs non-HBase tables), so you probably want to add some tests for those similar to the ones in hbase_queries.q and hbase_joins.q.
          Hide
          Edward Capriolo added a comment -

          For those interested, datastax has included this as a feature of brisk http://www.datastax.com/products/brisk . They have aided one feature that does away with the mapping and just allows all returned columns to be treated as tuples.

          Show
          Edward Capriolo added a comment - For those interested, datastax has included this as a feature of brisk http://www.datastax.com/products/brisk . They have aided one feature that does away with the mapping and just allows all returned columns to be treated as tuples.
          Hide
          Edward Capriolo added a comment -

          It is now pretty easy to take the Brisk jar and drop it into hive:

          https://github.com/riptano/hive/wiki/Cassandra-Handler-usage-in-Hive-0.7-with-Cassandra-0.7

          Also the brisk version of the handler has more features then this as it can transpose wide rows into long columns. I think at this point we might as well abandon trying to get this code into hive. It is much easier to code/innovate it as an external project with git then inside hadoop-hive.

          Show
          Edward Capriolo added a comment - It is now pretty easy to take the Brisk jar and drop it into hive: https://github.com/riptano/hive/wiki/Cassandra-Handler-usage-in-Hive-0.7-with-Cassandra-0.7 Also the brisk version of the handler has more features then this as it can transpose wide rows into long columns. I think at this point we might as well abandon trying to get this code into hive. It is much easier to code/innovate it as an external project with git then inside hadoop-hive.
          Hide
          Ashutosh Chauhan added a comment -

          Is there any interest in reviving this? I guess it will be good to get it into hive since it mostly implements Hive's interfaces.

          Show
          Ashutosh Chauhan added a comment - Is there any interest in reviving this? I guess it will be good to get it into hive since it mostly implements Hive's interfaces.
          Hide
          Nicolas Lalevée added a comment -

          I have refreshed the patch, see HIVE-1434-r1182878.patch

          • upgraded to use cassandra 0.8.7
          • the dependencies of cassandra maybe be in conflict with the dependencies of hive. For instance commons-cli 1.2 is "required" by cassandra, and hive doesn't compile against it. So the "exclude" in the ivy.xml. I've put the other dependency of cassandra that may get into conflict with hive's ones, but commented.
          • add ASF headers
          • commented code removed
          • formatted code
          • ivy of hive-cassandra-handler make look like the hive-hbase-handler
          • in the build, the build of cassandra-handler is now called everywhere the build of hbase-handler is also being called
          • in order to make it compile, I had to comment the line 93 of /hive/cassandra-handler/src/test/org/apache/hadoop/hive/cassandra/CassandraTestSetup.java : it required com.google.common.collect.AbstractIterator, not sure why.

          not actually related:

          • in the build.xml files, includeantruntime="false" added to every javac, to avoid weired build classpath (as per the warning printed by ant). Apart for the 'ant' module, obviously.
          • in build-common.xml, the pattern used made some retrive conflict, so I have fixed the pattern
          • in build-common.xml, add a description to the targets test and jar to be listed into the target listing (ant -p)

          But then, tests didn't passed locally, the classpath seems fucked up. I got:

            <testcase classname="org.apache.hadoop.hive.cli.TestCassandraCliDriver" name="testCliDriver_cassandra_queries" time="1.579">
              <error message="Implementing class" type="java.lang.IncompatibleClassChangeError">java.lang.IncompatibleClassChangeError: Implementing class
          	at java.lang.ClassLoader.defineClass1(Native Method)
          	at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
          	at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
          	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
          	at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
          	at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
          	at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
          	at java.security.AccessController.doPrivileged(Native Method)
          	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
          	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
          	at org.apache.hadoop.hive.cassandra.FramedConnWrapper.getClient(FramedConnWrapper.java:50)
          	at org.apache.hadoop.hive.cassandra.CassandraTestSetup.preTest(CassandraTestSetup.java:69)
          	at org.apache.hadoop.hive.cassandra.CassandraQTestUtil.&lt;init&gt;(CassandraQTestUtil.java:31)
          	at org.apache.hadoop.hive.cli.TestCassandraCliDriver.setUp(TestCassandraCliDriver.java:41)
          	at junit.framework.TestCase.runBare(TestCase.java:132)
          	at junit.framework.TestResult$1.protect(TestResult.java:110)
          	at junit.framework.TestResult.runProtected(TestResult.java:128)
          	at junit.framework.TestResult.run(TestResult.java:113)
          	at junit.framework.TestCase.run(TestCase.java:124)
          	at junit.framework.TestSuite.runTest(TestSuite.java:232)
          	at junit.framework.TestSuite.run(TestSuite.java:227)
          	at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
          	at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
          	at junit.framework.TestResult.runProtected(TestResult.java:128)
          	at junit.extensions.TestSetup.run(TestSetup.java:27)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906)
          </error>
          

          And the embedded cassandra is relying on files on /tmp. In the patch, see the file cassandra-handler/conf/cassandra.yaml. I don't know if we can make the paths relative.

          Show
          Nicolas Lalevée added a comment - I have refreshed the patch, see HIVE-1434 -r1182878.patch upgraded to use cassandra 0.8.7 the dependencies of cassandra maybe be in conflict with the dependencies of hive. For instance commons-cli 1.2 is "required" by cassandra, and hive doesn't compile against it. So the "exclude" in the ivy.xml. I've put the other dependency of cassandra that may get into conflict with hive's ones, but commented. add ASF headers commented code removed formatted code ivy of hive-cassandra-handler make look like the hive-hbase-handler in the build, the build of cassandra-handler is now called everywhere the build of hbase-handler is also being called in order to make it compile, I had to comment the line 93 of /hive/cassandra-handler/src/test/org/apache/hadoop/hive/cassandra/CassandraTestSetup.java : it required com.google.common.collect.AbstractIterator, not sure why. not actually related: in the build.xml files, includeantruntime="false" added to every javac, to avoid weired build classpath (as per the warning printed by ant). Apart for the 'ant' module, obviously. in build-common.xml, the pattern used made some retrive conflict, so I have fixed the pattern in build-common.xml, add a description to the targets test and jar to be listed into the target listing (ant -p) But then, tests didn't passed locally, the classpath seems fucked up. I got: <testcase classname= "org.apache.hadoop.hive.cli.TestCassandraCliDriver" name= "testCliDriver_cassandra_queries" time= "1.579" > <error message= "Implementing class" type= "java.lang.IncompatibleClassChangeError" > java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hadoop.hive.cassandra.FramedConnWrapper.getClient(FramedConnWrapper.java:50) at org.apache.hadoop.hive.cassandra.CassandraTestSetup.preTest(CassandraTestSetup.java:69) at org.apache.hadoop.hive.cassandra.CassandraQTestUtil.&lt;init&gt;(CassandraQTestUtil.java:31) at org.apache.hadoop.hive.cli.TestCassandraCliDriver.setUp(TestCassandraCliDriver.java:41) at junit.framework.TestCase.runBare(TestCase.java:132) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24) at junit.extensions.TestSetup$1.protect(TestSetup.java:23) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.extensions.TestSetup.run(TestSetup.java:27) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:518) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1052) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:906) </error> And the embedded cassandra is relying on files on /tmp. In the patch, see the file cassandra-handler/conf/cassandra.yaml. I don't know if we can make the paths relative.
          Hide
          Edward Capriolo added a comment -

          @Nicolas Lalevée
          Nice work. Did you base this off the brisk version or did you just clean up my latest patch. The brisk version has transposed row support which is very interesting.

          Show
          Edward Capriolo added a comment - @Nicolas Lalevée Nice work. Did you base this off the brisk version or did you just clean up my latest patch. The brisk version has transposed row support which is very interesting.
          Hide
          Edward Capriolo added a comment -

          We can actually set relative paths in the cassandra.yaml. They end up being relative to the working directory.

          Show
          Edward Capriolo added a comment - We can actually set relative paths in the cassandra.yaml. They end up being relative to the working directory.
          Hide
          Nicolas Lalevée added a comment -

          I refreshed hive-1434-2011-03-14.patch.txt. I never found the source of the brisk version, do you have any pointer ?

          Show
          Nicolas Lalevée added a comment - I refreshed hive-1434-2011-03-14.patch.txt. I never found the source of the brisk version, do you have any pointer ?
          Hide
          Jonathan Ellis added a comment -

          It might be better to have this live in the Cassandra tree (with our m/r and Pig handlers) since we're planning to add features like the transposition support Ed mentioned. My impression is that patch turnaround time is significantly lower there.

          Show
          Jonathan Ellis added a comment - It might be better to have this live in the Cassandra tree (with our m/r and Pig handlers) since we're planning to add features like the transposition support Ed mentioned. My impression is that patch turnaround time is significantly lower there.
          Hide
          Nicolas Lalevée added a comment -

          I finally found the source of the brisk version. As suggested by Jonathan, I made it a patch there: CASSANDRA-913

          Show
          Nicolas Lalevée added a comment - I finally found the source of the brisk version. As suggested by Jonathan, I made it a patch there: CASSANDRA-913
          Hide
          Edward Capriolo added a comment -

          This feature is a complete utter failure. It was never committed to hive. It was never committed to cassandra. I find ~40 forks of the code that are likely derivative works that make no reference to me or hive and all types of people are now asserting copyright over it. I am closing this issue and making a clean room implementation of a new handler.

          Show
          Edward Capriolo added a comment - This feature is a complete utter failure. It was never committed to hive. It was never committed to cassandra. I find ~40 forks of the code that are likely derivative works that make no reference to me or hive and all types of people are now asserting copyright over it. I am closing this issue and making a clean room implementation of a new handler.

            People

            • Assignee:
              Edward Capriolo
              Reporter:
              Edward Capriolo
            • Votes:
              17 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development