Hive
  1. Hive
  2. HIVE-895

Add SerDe for Avro serialized data

    Details

      Description

      As Avro continues to mature, having a SerDe to allow HiveQL queries over Avro data seems like a solid win.

      1. doctors.avro
        0.5 kB
        Jakob Homan
      2. episodes.avro
        0.6 kB
        Jakob Homan
      3. HIVE-895.patch
        189 kB
        Jakob Homan
      4. hive-895.patch.1.txt
        190 kB
        Edward Capriolo
      5. HIVE-895-draft.patch
        160 kB
        Jakob Homan

        Issue Links

          Activity

          Hide
          Alex Rovner added a comment -

          Can some one please explain to me how would this serde work?

          Specifically how would it deserialize the data?

          From what I understand Avro file has a header that defines the data that is stored in the file. In order to deserialize the data you need to read the header which is a challenge in Hive's Deserialize interface because the initialize() method does not know anything about the input file. (Note: there is a hack that can get you the file by getting the map.input hadoop property.... this hack however is not good enough in hive because some one might be using the CLI to query which will not trigger a map reduce job.

          Does anyone know a good solution to this issue?

          I am actually trying to implements a different file format but the idea of our format is similar to Avro: Each file has a header in which it contains a "schema"

          Thanks

          Show
          Alex Rovner added a comment - Can some one please explain to me how would this serde work? Specifically how would it deserialize the data? From what I understand Avro file has a header that defines the data that is stored in the file. In order to deserialize the data you need to read the header which is a challenge in Hive's Deserialize interface because the initialize() method does not know anything about the input file. (Note: there is a hack that can get you the file by getting the map.input hadoop property.... this hack however is not good enough in hive because some one might be using the CLI to query which will not trigger a map reduce job. Does anyone know a good solution to this issue? I am actually trying to implements a different file format but the idea of our format is similar to Avro: Each file has a header in which it contains a "schema" Thanks
          Hide
          Zheng Shao added a comment -

          We should just copy \the schema information from the file header to the hive metastore.

          Show
          Zheng Shao added a comment - We should just copy \the schema information from the file header to the hive metastore.
          Hide
          Doug Cutting added a comment -

          This may be similar to PIG-1748.

          Show
          Doug Cutting added a comment - This may be similar to PIG-1748 .
          Hide
          Ron Bodkin added a comment -

          To properly support Avro file format will also require adding a store as format (as was done for RCFile). It looks like that's not yet a pluggable interface

          Show
          Ron Bodkin added a comment - To properly support Avro file format will also require adding a store as format (as was done for RCFile). It looks like that's not yet a pluggable interface
          Hide
          Jakob Homan added a comment -

          We've written a library for this that we'll be open sourcing in a short timeframe (6-8 weeks). If there is interest in that, I can take a look at reformatting it as a patch.

          Show
          Jakob Homan added a comment - We've written a library for this that we'll be open sourcing in a short timeframe (6-8 weeks). If there is interest in that, I can take a look at reformatting it as a patch.
          Hide
          Carl Steinbach added a comment -

          @Jakob: There's lots of interest Please post the patch, even if it's a WIP.

          Show
          Carl Steinbach added a comment - @Jakob: There's lots of interest Please post the patch, even if it's a WIP.
          Hide
          Jakob Homan added a comment -

          We've released the serde (we're calling it haivvreo) into open source here: http://bit.ly/iwEQzJ We've been testing it in our production ETL for a while and it's working well. I'd like to hold off on making a Hive patch until we've iterated a bit more and ironed out any bugs that are in hiding. I'll plan on getting a patch in before 0.8 is released. In the mean time, anyone who wants to kick the tires is welcome to.

          Show
          Jakob Homan added a comment - We've released the serde (we're calling it haivvreo) into open source here: http://bit.ly/iwEQzJ We've been testing it in our production ETL for a while and it's working well. I'd like to hold off on making a Hive patch until we've iterated a bit more and ironed out any bugs that are in hiding. I'll plan on getting a patch in before 0.8 is released. In the mean time, anyone who wants to kick the tires is welcome to.
          Hide
          Carl Steinbach added a comment -

          @jakob: The code on github looks really good.

          The release branch for 0.8.0 is going to get created sometime in the next couple of weeks. Do you think it will be possible to get a patch ready for review before then?

          Show
          Carl Steinbach added a comment - @jakob: The code on github looks really good. The release branch for 0.8.0 is going to get created sometime in the next couple of weeks. Do you think it will be possible to get a patch ready for review before then?
          Hide
          Jakob Homan added a comment -

          A couple weeks is probably not feasible. Assuming 0.9 comes out in a few months after that, that's probably a better bet.

          Show
          Jakob Homan added a comment - A couple weeks is probably not feasible. Assuming 0.9 comes out in a few months after that, that's probably a better bet.
          Hide
          Carl Steinbach added a comment -

          @Jakob: Just wanted to check in and see if you're ready to get this
          patch committed to trunk. The 0.8.0 release branch was created
          yesterday, so you wouldn't have to worry about this work immediately
          appearing in a release.

          Show
          Carl Steinbach added a comment - @Jakob: Just wanted to check in and see if you're ready to get this patch committed to trunk. The 0.8.0 release branch was created yesterday, so you wouldn't have to worry about this work immediately appearing in a release.
          Hide
          Lianhui Wang added a comment -

          @Jakob:i read the code of the haivvreo.
          and i think it can do with protocol buffers like haivvreo.
          google 's paper tenzing like hive said it support protocol buffers and columnIO.

          Show
          Lianhui Wang added a comment - @Jakob:i read the code of the haivvreo. and i think it can do with protocol buffers like haivvreo. google 's paper tenzing like hive said it support protocol buffers and columnIO.
          Hide
          Carl Steinbach added a comment -

          @Jakob: Do you think we can get this committed in time for the 0.9.0 release? Are you willing to attach a patch? Thanks.

          Show
          Carl Steinbach added a comment - @Jakob: Do you think we can get this committed in time for the 0.9.0 release? Are you willing to attach a patch? Thanks.
          Hide
          Edward Capriolo added a comment -

          This has a Apache V2 licence if anyone wants we should be able to just git --clone this and patch it in if we keep the licence file.

          Show
          Edward Capriolo added a comment - This has a Apache V2 licence if anyone wants we should be able to just git --clone this and patch it in if we keep the licence file.
          Hide
          Jakob Homan added a comment -

          yes, I've set aside some time early next week to get it into patch form.

          Show
          Jakob Homan added a comment - yes, I've set aside some time early next week to get it into patch form.
          Hide
          Jakob Homan added a comment -

          Here's a first draft of the port to ASF. It corresponds to the mergeHive8ToMaster branch on github, which has all the latest fixes and is compatible with Hive 8. Need to re-format to Hive style and run full unit tests.

          One thing of concern is that the avroserde relies on the ql package, which required a change to the build script to build serde afterwards. Is there a defined dependency for Hive's modules, and if so does this break that? If so, the other option would be to move this to the contrib package, but to me contrib is a dirty word and I'd like to avoid that.

          Also, this bundles the avro serde into the serde jar. It'd be nice for those not using Avro to not require it, but Avro is already a build-time dependency so it's not a new problem. Eventually it'd be nice to have a separate jar with just the serde in it to make the code more modular.

          I'll finish the port in the next couple of days, but take a glance and comment if you'd like.

          Show
          Jakob Homan added a comment - Here's a first draft of the port to ASF. It corresponds to the mergeHive8ToMaster branch on github, which has all the latest fixes and is compatible with Hive 8. Need to re-format to Hive style and run full unit tests. One thing of concern is that the avroserde relies on the ql package, which required a change to the build script to build serde afterwards. Is there a defined dependency for Hive's modules, and if so does this break that? If so, the other option would be to move this to the contrib package, but to me contrib is a dirty word and I'd like to avoid that. Also, this bundles the avro serde into the serde jar. It'd be nice for those not using Avro to not require it, but Avro is already a build-time dependency so it's not a new problem. Eventually it'd be nice to have a separate jar with just the serde in it to make the code more modular. I'll finish the port in the next couple of days, but take a glance and comment if you'd like.
          Hide
          Carl Steinbach added a comment -

          One thing of concern is that the avroserde relies on the ql package, which required a change to the build script to build serde afterwards. Is there a defined dependency for Hive's modules, and if so does this break that? If so, the other option would be to move this to the contrib package, but to me contrib is a dirty word and I'd like to avoid that.

          Right now ql depends on serde, but not vice-versa, so it sounds like this patch will add a circular dependency which is something we should probably try to avoid. I also agree with you that this belongs in the serde package as opposed to contrib. Do you think it's possible to move the ql code that the avro serde depends on to common?

          Also, I apologize in advance for this request, but would you mind posting a review for this on phabricator? Directions are located here:
          https://cwiki.apache.org/Hive/phabricatorcodereview.html

          If this looks like too much of I pain I can post the review request for you, but I may need some help applying the patch to trunk.

          Thanks!

          Show
          Carl Steinbach added a comment - One thing of concern is that the avroserde relies on the ql package, which required a change to the build script to build serde afterwards. Is there a defined dependency for Hive's modules, and if so does this break that? If so, the other option would be to move this to the contrib package, but to me contrib is a dirty word and I'd like to avoid that. Right now ql depends on serde, but not vice-versa, so it sounds like this patch will add a circular dependency which is something we should probably try to avoid. I also agree with you that this belongs in the serde package as opposed to contrib. Do you think it's possible to move the ql code that the avro serde depends on to common? Also, I apologize in advance for this request, but would you mind posting a review for this on phabricator? Directions are located here: https://cwiki.apache.org/Hive/phabricatorcodereview.html If this looks like too much of I pain I can post the review request for you, but I may need some help applying the patch to trunk. Thanks!
          Hide
          Jakob Homan added a comment -

          Do you think it's possible to move the ql code that the avro serde depends on to common?

          Should be fine. Will do this week.

          Also, I apologize in advance for this request, but would you mind posting a review for this on phabricator?

          Due it its reliance on facebook.com, this site still doesn't display correctly for me, but I'll use a different browser just for this request.

          Show
          Jakob Homan added a comment - Do you think it's possible to move the ql code that the avro serde depends on to common? Should be fine. Will do this week. Also, I apologize in advance for this request, but would you mind posting a review for this on phabricator? Due it its reliance on facebook.com, this site still doesn't display correctly for me, but I'll use a different browser just for this request.
          Hide
          Jakob Homan added a comment -

          btw, moving stuff to common doesn't work, so I'm doing a bit of refactoring. New patch shortly... (schedule permitting)

          Show
          Jakob Homan added a comment - btw, moving stuff to common doesn't work, so I'm doing a bit of refactoring. New patch shortly... (schedule permitting)
          Hide
          Jakob Homan added a comment -

          Final patch. Swtiching to TBLPROPERTIES rather than SERDEPROPERTIES obviated the need for the ql calls previously. Spent a lot of frustrtating time trying to get the phabricator to work (quite surprised that this FB-specific framework is kosher in the ASF). It's up at https://reviews.facebook.net/D3321

          Show
          Jakob Homan added a comment - Final patch. Swtiching to TBLPROPERTIES rather than SERDEPROPERTIES obviated the need for the ql calls previously. Spent a lot of frustrtating time trying to get the phabricator to work (quite surprised that this FB-specific framework is kosher in the ASF). It's up at https://reviews.facebook.net/D3321
          Hide
          Jakob Homan added a comment -

          Forgot to grant ASF license. Re-uping.

          Show
          Jakob Homan added a comment - Forgot to grant ASF license. Re-uping.
          Hide
          Edward Capriolo added a comment -

          This is a great addition to hive. I will start reviewing to wrap my head around it but since you only made a tiny upstream change to hive (an ivy file) and you have extensive unit tests this looks good to go. +1 Will commit if tests pass.

          Show
          Edward Capriolo added a comment - This is a great addition to hive. I will start reviewing to wrap my head around it but since you only made a tiny upstream change to hive (an ivy file) and you have extensive unit tests this looks good to go. +1 Will commit if tests pass.
          Hide
          Jakob Homan added a comment -

          Did the tests pass? Anything else I can do to help?

          Show
          Jakob Homan added a comment - Did the tests pass? Anything else I can do to help?
          Hide
          Carl Steinbach added a comment -

          @Jakob: I'll take a look at this weekend. Sorry for the delay, and thanks again for putting this together!

          Show
          Carl Steinbach added a comment - @Jakob: I'll take a look at this weekend. Sorry for the delay, and thanks again for putting this together!
          Hide
          Edward Capriolo added a comment -

          @Carl you don't have to worry. @JAKOB I will get it done probably later tonight. You did a solid job, I had to familiarize myself with avro a bit before I did the review.

          Show
          Edward Capriolo added a comment - @Carl you don't have to worry. @JAKOB I will get it done probably later tonight. You did a solid job, I had to familiarize myself with avro a bit before I did the review.
          Hide
          Edward Capriolo added a comment -

          Running this.
          ant test -Dtestcase=TestCliDriver -Dqfile=avro_joins.q -Dtest.silent=false
          Throws this.

          org.apache.hadoop.hive.ql.metadata.HiveException: Cannot validate serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe
          	at org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3168)
          	at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3290)
          	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:243)
          	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
          	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
          	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1322)
          	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1108)
          	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943)
          	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
          	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
          	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
          	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
          	at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:669)
          	at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins(TestCliDriver.java:125)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          	at java.lang.reflect.Method.invoke(Method.java:597)
          	at junit.framework.TestCase.runTest(TestCase.java:168)
          	at junit.framework.TestCase.runBare(TestCase.java:134)
          	at junit.framework.TestResult$1.protect(TestResult.java:110)
          	at junit.framework.TestResult.runProtected(TestResult.java:128)
          	at junit.framework.TestResult.run(TestResult.java:113)
          	at junit.framework.TestCase.run(TestCase.java:124)
          	at junit.framework.TestSuite.runTest(TestSuite.java:243)
          	at junit.framework.TestSuite.run(TestSuite.java:238)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
          	at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
          Caused by: org.apache.hadoop.hive.serde2.SerDeException: SerDe org.apache.hadoop.hive.serde2.avro.AvroSerDe does not exist
          	at org.apache.hadoop.hive.serde2.SerDeUtils.lookupDeserializer(SerDeUtils.java:85)
          	at org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3163)
          	... 28 more
          

          Any thoughts?

          Show
          Edward Capriolo added a comment - Running this. ant test -Dtestcase=TestCliDriver -Dqfile=avro_joins.q -Dtest.silent=false Throws this. org.apache.hadoop.hive.ql.metadata.HiveException: Cannot validate serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe at org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3168) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3290) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:243) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1322) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1108) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:943) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:669) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_joins(TestCliDriver.java:125) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768) Caused by: org.apache.hadoop.hive.serde2.SerDeException: SerDe org.apache.hadoop.hive.serde2.avro.AvroSerDe does not exist at org.apache.hadoop.hive.serde2.SerDeUtils.lookupDeserializer(SerDeUtils.java:85) at org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:3163) ... 28 more Any thoughts?
          Hide
          Edward Capriolo added a comment -

          I think the issue is the avro jars are not on the compile classpath but either need to be on the runtime classpath or added with add jar.

          Show
          Edward Capriolo added a comment - I think the issue is the avro jars are not on the compile classpath but either need to be on the runtime classpath or added with add jar.
          Hide
          Jakob Homan added a comment -

          The problem is that the tests in ql load up the serde package from the local ivy rather than from the build path, unless you do a full very-clean. These jars don't have the new classes and hence fail. I could reproduce this by running a test without the patch, applying the patch, running a test and it would then fail from the local jars. Running very-clean, applying the patch and then running the test passes:

              [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
              [junit] Begin query: avro_joins.q
              [junit] Copying file: file:/private/tmp/tp895/git/data/files/doctors.avro
              [junit] Copying file: file:/private/tmp/tp895/git/data/files/episodes.avro
              [junit] diff -a /private/tmp/tp895/git/build/ql/test/logs/clientpositive/avro_joins.q.out /private/tmp/tp895/git/ql/src/test/results/clientpositive/avro_joins.q.out
              [junit] Done query: avro_joins.q elapsedTime=16s
              [junit] Cleaning up TestCliDriver
              [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 24.91 sec
          

          I reproduced this on both my Mac and RHEL boxes and verified that if you go and blow away the ~./cache/org.apache.hive/hive-serde/jars/ directory and leave everything else constant, the test passes. This is a problem with how the test infrastructure loads classes, not with this patch itself...

          Show
          Jakob Homan added a comment - The problem is that the tests in ql load up the serde package from the local ivy rather than from the build path, unless you do a full very-clean. These jars don't have the new classes and hence fail. I could reproduce this by running a test without the patch, applying the patch, running a test and it would then fail from the local jars. Running very-clean, applying the patch and then running the test passes: [junit] Running org.apache.hadoop.hive.cli.TestCliDriver [junit] Begin query: avro_joins.q [junit] Copying file: file:/private/tmp/tp895/git/data/files/doctors.avro [junit] Copying file: file:/private/tmp/tp895/git/data/files/episodes.avro [junit] diff -a /private/tmp/tp895/git/build/ql/test/logs/clientpositive/avro_joins.q.out /private/tmp/tp895/git/ql/src/test/results/clientpositive/avro_joins.q.out [junit] Done query: avro_joins.q elapsedTime=16s [junit] Cleaning up TestCliDriver [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 24.91 sec I reproduced this on both my Mac and RHEL boxes and verified that if you go and blow away the ~./cache/org.apache.hive/hive-serde/jars/ directory and leave everything else constant, the test passes. This is a problem with how the test infrastructure loads classes, not with this patch itself...
          Hide
          Ashutosh Chauhan added a comment -

          Yup, HIVE-3035 needs to be fixed.

          Show
          Ashutosh Chauhan added a comment - Yup, HIVE-3035 needs to be fixed.
          Hide
          Jakob Homan added a comment -

          Yeah, that should get fixed, but the bigger problem is that tests shouldn't be relying on ivy artifacts all (for any of the Hive artifacts). The classes-under-test should be loaded directly from build/ either as classes or jars. Currently, all new patches that go between components and aren't very-clean'ed first are not getting tested correctly.

          Show
          Jakob Homan added a comment - Yeah, that should get fixed, but the bigger problem is that tests shouldn't be relying on ivy artifacts all (for any of the Hive artifacts). The classes-under-test should be loaded directly from build/ either as classes or jars. Currently, all new patches that go between components and aren't very-clean'ed first are not getting tested correctly.
          Hide
          Edward Capriolo added a comment -

          avro_sanity_test.q had a different comment i patched it for you

          Show
          Edward Capriolo added a comment - avro_sanity_test.q had a different comment i patched it for you
          Hide
          Edward Capriolo added a comment -

          +1 committed. Thank you Jakob. nice contribution.

          Show
          Edward Capriolo added a comment - +1 committed. Thank you Jakob. nice contribution.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1463 (See https://builds.apache.org/job/Hive-trunk-h0.21/1463/)
          HIVE-895 Add SerDe for Avro serialized data (Jakob Homan via egc) (Revision 1345420)

          Result = SUCCESS
          ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1345420
          Files :

          • /hive/trunk/data/files/doctors.avro
          • /hive/trunk/data/files/episodes.avro
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerInputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_change_schema.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_evolved_schemas.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_joins.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_sanity_test.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_error_message.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_literal.q
          • /hive/trunk/ql/src/test/results/clientpositive/avro_change_schema.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_evolved_schemas.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_joins.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_sanity_test.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_schema_error_message.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out
          • /hive/trunk/serde/ivy.xml
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroObjectInspectorGenerator.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeException.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/BadSchemaException.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/InstanceCache.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/ReaderWriterSchemaPair.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaResolutionProblem.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestInstanceCache.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestThatEvolvedSchemasActAsWeWant.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1463 (See https://builds.apache.org/job/Hive-trunk-h0.21/1463/ ) HIVE-895 Add SerDe for Avro serialized data (Jakob Homan via egc) (Revision 1345420) Result = SUCCESS ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1345420 Files : /hive/trunk/data/files/doctors.avro /hive/trunk/data/files/episodes.avro /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerInputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java /hive/trunk/ql/src/test/queries/clientpositive/avro_change_schema.q /hive/trunk/ql/src/test/queries/clientpositive/avro_evolved_schemas.q /hive/trunk/ql/src/test/queries/clientpositive/avro_joins.q /hive/trunk/ql/src/test/queries/clientpositive/avro_sanity_test.q /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_error_message.q /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_literal.q /hive/trunk/ql/src/test/results/clientpositive/avro_change_schema.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_evolved_schemas.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_joins.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_sanity_test.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_schema_error_message.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out /hive/trunk/serde/ivy.xml /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroObjectInspectorGenerator.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeException.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/BadSchemaException.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/InstanceCache.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/ReaderWriterSchemaPair.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaResolutionProblem.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestInstanceCache.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestThatEvolvedSchemasActAsWeWant.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java
          Hide
          Ashutosh Chauhan added a comment -

          committed.

          Wuhoo. Awesome, Thanks Jakob for contributing, making Avro data easily accessible from Hive.

          Show
          Ashutosh Chauhan added a comment - committed. Wuhoo. Awesome, Thanks Jakob for contributing, making Avro data easily accessible from Hive.
          Hide
          Lars Francke added a comment -

          Thanks! This is awesome to have in Hive.

          Could I please ask you to document this feature on the Wiki? Or if you can list all the features it supports I'll happily write something up too.

          Show
          Lars Francke added a comment - Thanks! This is awesome to have in Hive. Could I please ask you to document this feature on the Wiki? Or if you can list all the features it supports I'll happily write something up too.
          Hide
          Carl Steinbach added a comment -

          @Jakob: Thanks for contributing this to Hive!
          @Ed: Thanks for getting this committed!

          Show
          Carl Steinbach added a comment - @Jakob: Thanks for contributing this to Hive! @Ed: Thanks for getting this committed!
          Hide
          Jakob Homan added a comment -

          Could I please ask you to document this feature on the Wiki?

          Sure thing. I'll transfer all the text from the github account shortly. I'm traveling so it may take a few days.

          Show
          Jakob Homan added a comment - Could I please ask you to document this feature on the Wiki? Sure thing. I'll transfer all the text from the github account shortly. I'm traveling so it may take a few days.
          Hide
          Jakob Homan added a comment -

          I've moved all the relevant info from the github page to the Hive Wiki and linked to it from the main page: https://cwiki.apache.org/confluence/display/Hive/AvroSerDe+-+working+with+Avro+from+Hive

          Show
          Jakob Homan added a comment - I've moved all the relevant info from the github page to the Hive Wiki and linked to it from the main page: https://cwiki.apache.org/confluence/display/Hive/AvroSerDe+-+working+with+Avro+from+Hive
          Hide
          Lars Francke added a comment -

          Thank you very much!

          Show
          Lars Francke added a comment - Thank you very much!
          Hide
          Mithun Radhakrishnan added a comment -

          Jakob, a quick question: The JIRA indicates that the AvroSerde is available in 0.9.1 and in 0.10.0. (And that's in sync with https://cwiki.apache.org/confluence/display/Hive/AvroSerDe+-+working+with+Avro+from+Hive).

          But it looks like the branch-0.9/ doesn't have this code. Are there plans to port this over?

          (Thanks for writing this, by the way. :])

          Show
          Mithun Radhakrishnan added a comment - Jakob, a quick question: The JIRA indicates that the AvroSerde is available in 0.9.1 and in 0.10.0. (And that's in sync with https://cwiki.apache.org/confluence/display/Hive/AvroSerDe+-+working+with+Avro+from+Hive ). But it looks like the branch-0.9/ doesn't have this code. Are there plans to port this over? (Thanks for writing this, by the way. :])
          Hide
          Jakob Homan added a comment -

          The fixed version is 9.1, but at least in the JIRA, I don't see it having been committed to anywhere else. You can check that branch for the 895 commit, but I don't think it was.

          I don't plan to do any porting. You're welcome to try it or just go ahead and use haivvreo until 0.10 comes out.

          Show
          Jakob Homan added a comment - The fixed version is 9.1, but at least in the JIRA, I don't see it having been committed to anywhere else. You can check that branch for the 895 commit, but I don't think it was. I don't plan to do any porting. You're welcome to try it or just go ahead and use haivvreo until 0.10 comes out.
          Show
          Ruslan Al-Fakikh added a comment - Mithun, what distro do you use? Cloudera patched an earlier version of Hive with this SerDe: https://ccp.cloudera.com/display/CDHDOC/New+Features+in+CDH3#NewFeaturesinCDH3-What%27sNewinCDH3Update5 https://ccp.cloudera.com/display/DOC/CDH+Version+and+Packaging+Information#CDHVersionandPackagingInformation-CDH3Update5Packaging
          Hide
          Mithun Radhakrishnan added a comment -

          @Jakob Homan: Thanks for clarifying. I'll see if this can't be merged into branch-0.9/.
          @Ruslan Al-Fakikh: Thank you for the heads-up. We at Yahoo are currently using our own builds instead of a commercial distro. :] It'd be great to have this included in 0.9.1.

          Show
          Mithun Radhakrishnan added a comment - @ Jakob Homan : Thanks for clarifying. I'll see if this can't be merged into branch-0.9/. @ Ruslan Al-Fakikh : Thank you for the heads-up. We at Yahoo are currently using our own builds instead of a commercial distro. :] It'd be great to have this included in 0.9.1.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-895 Add SerDe for Avro serialized data (Jakob Homan via egc) (Revision 1345420)

          Result = ABORTED
          ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1345420
          Files :

          • /hive/trunk/data/files/doctors.avro
          • /hive/trunk/data/files/episodes.avro
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerInputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java
          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_change_schema.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_evolved_schemas.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_joins.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_sanity_test.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_error_message.q
          • /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_literal.q
          • /hive/trunk/ql/src/test/results/clientpositive/avro_change_schema.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_evolved_schemas.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_joins.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_sanity_test.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_schema_error_message.q.out
          • /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out
          • /hive/trunk/serde/ivy.xml
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroObjectInspectorGenerator.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeException.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/BadSchemaException.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/InstanceCache.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/ReaderWriterSchemaPair.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaResolutionProblem.java
          • /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestInstanceCache.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestThatEvolvedSchemasActAsWeWant.java
          • /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-895 Add SerDe for Avro serialized data (Jakob Homan via egc) (Revision 1345420) Result = ABORTED ecapriolo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1345420 Files : /hive/trunk/data/files/doctors.avro /hive/trunk/data/files/episodes.avro /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerInputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroContainerOutputFormat.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordWriter.java /hive/trunk/ql/src/test/queries/clientpositive/avro_change_schema.q /hive/trunk/ql/src/test/queries/clientpositive/avro_evolved_schemas.q /hive/trunk/ql/src/test/queries/clientpositive/avro_joins.q /hive/trunk/ql/src/test/queries/clientpositive/avro_sanity_test.q /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_error_message.q /hive/trunk/ql/src/test/queries/clientpositive/avro_schema_literal.q /hive/trunk/ql/src/test/results/clientpositive/avro_change_schema.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_evolved_schemas.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_joins.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_sanity_test.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_schema_error_message.q.out /hive/trunk/ql/src/test/results/clientpositive/avro_schema_literal.q.out /hive/trunk/serde/ivy.xml /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroObjectInspectorGenerator.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeException.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerializer.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/BadSchemaException.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/InstanceCache.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/ReaderWriterSchemaPair.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaResolutionProblem.java /hive/trunk/serde/src/java/org/apache/hadoop/hive/serde2/avro/SchemaToTypeInfo.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroDeserializer.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroObjectInspectorGenerator.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerde.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerializer.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestGenericAvroRecordWritable.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestInstanceCache.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/TestThatEvolvedSchemasActAsWeWant.java /hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java
          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

            People

            • Assignee:
              Jakob Homan
              Reporter:
              Jeff Hammerbacher
            • Votes:
              15 Vote for this issue
              Watchers:
              50 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development