Hive
  1. Hive
  2. HIVE-6806

CREATE TABLE should support STORED AS AVRO

    Details

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Hide
      Add support to infer Avro schema from Hive table schema. Avro-backed tables can simply be created by using "STORED AS AVRO" in DDL statement. AvroSerDe takes care of creating appropriate Avro schema from Hive table schema, a big win in terms of Avro usability in Hive.
      Show
      Add support to infer Avro schema from Hive table schema. Avro-backed tables can simply be created by using "STORED AS AVRO" in DDL statement. AvroSerDe takes care of creating appropriate Avro schema from Hive table schema, a big win in terms of Avro usability in Hive.

      Description

      Avro is well established and widely used within Hive, however creating Avro-backed tables requires the messy listing of the SerDe, InputFormat and OutputFormat classes.

      Similarly to HIVE-5783 for Parquet, Hive would be easier to use if it had native Avro support.

      1. HIVE-6806.patch
        48 kB
        Ashish K Singh
      2. HIVE-6806.3.patch
        77 kB
        Ashish K Singh
      3. HIVE-6806.2.patch
        64 kB
        Ashish K Singh
      4. HIVE-6806.1.patch
        50 kB
        Ashish K Singh

        Issue Links

          Activity

          Hide
          Ashish K Singh added a comment -
          Show
          Ashish K Singh added a comment - RB: https://reviews.apache.org/r/23387/
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12654935/HIVE-6806.patch

          ERROR: -1 due to 4 failed/errored test(s), 5723 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_error_message
          org.apache.hadoop.hive.serde2.avro.TestAvroSerde.noSchemaProvidedReturnsErrorSchema
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/730/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/730/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-730/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 4 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12654935

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12654935/HIVE-6806.patch ERROR: -1 due to 4 failed/errored test(s), 5723 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_native org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_avro_schema_error_message org.apache.hadoop.hive.serde2.avro.TestAvroSerde.noSchemaProvidedReturnsErrorSchema org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/730/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/730/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-730/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12654935
          Hide
          Damien Carol added a comment -

          Doesn't this JIRA be superseded by HIVE-5976 ?

          Show
          Damien Carol added a comment - Doesn't this JIRA be superseded by HIVE-5976 ?
          Hide
          Brock Noland added a comment -

          I don't think this is superceded by HIVE-5976, but rather we have to figure out who one goes first and the other will have to update their patch. Since HIVE-5976 is a larger patch and will make this patch much smaller, I am inclined to let that one go first.

          David Chen what are your thoughts?

          Show
          Brock Noland added a comment - I don't think this is superceded by HIVE-5976 , but rather we have to figure out who one goes first and the other will have to update their patch. Since HIVE-5976 is a larger patch and will make this patch much smaller, I am inclined to let that one go first. David Chen what are your thoughts?
          Hide
          David Chen added a comment -

          Thanks for contributing this, Ashish K Singh!

          I agree with Brock Noland on this. Once HIVE-5976 goes in, then this patch will become much simpler since adding native support for Avro will no longer require the changes to the Hive grammar and parser.

          Show
          David Chen added a comment - Thanks for contributing this, Ashish K Singh ! I agree with Brock Noland on this. Once HIVE-5976 goes in, then this patch will become much simpler since adding native support for Avro will no longer require the changes to the Hive grammar and parser.
          Hide
          Ashish K Singh added a comment -

          Brock Noland and [~davidchen], thanks for reviewing the work.

          Sounds good to me. I have to anyways address Brock Noland's review comments.

          Show
          Ashish K Singh added a comment - Brock Noland and [~davidchen] , thanks for reviewing the work. Sounds good to me. I have to anyways address Brock Noland 's review comments.
          Hide
          David Chen added a comment -

          By the way, it may also be good to add a qfile test for Avro schema evolution over different partitions. I remember we have had to fix some issues related to schema evolution, such as HIVE-6835.

          FYI, I also have a TypeInfo to Avro Schema converter in my patch for HIVE-7286 along with some unit tests for the converter. Feel free to go ahead and make use of it.

          Show
          David Chen added a comment - By the way, it may also be good to add a qfile test for Avro schema evolution over different partitions. I remember we have had to fix some issues related to schema evolution, such as HIVE-6835 . FYI, I also have a TypeInfo to Avro Schema converter in my patch for HIVE-7286 along with some unit tests for the converter. Feel free to go ahead and make use of it.
          Hide
          Ashish K Singh added a comment -

          [~davidchen] Thanks for the pointers here. I do have a test for avro over partitions, avro_partitioned_native.q.

          Show
          Ashish K Singh added a comment - [~davidchen] Thanks for the pointers here. I do have a test for avro over partitions, avro_partitioned_native.q.
          Hide
          Ashish K Singh added a comment -

          I have addressed the reviews and updated RB with latest patch. Will attach latest patch here once I have rebased it over HIVE-5976.

          Show
          Ashish K Singh added a comment - I have addressed the reviews and updated RB with latest patch. Will attach latest patch here once I have rebased it over HIVE-5976 .
          Hide
          Carl Steinbach added a comment -

          Does anyone object to changing the summary of this ticket to "CREATE TABLE should support STORED AS AVRO"? The current description can be misinterpreted to mean that this patch is adding the AvroSerDe.

          Show
          Carl Steinbach added a comment - Does anyone object to changing the summary of this ticket to "CREATE TABLE should support STORED AS AVRO"? The current description can be misinterpreted to mean that this patch is adding the AvroSerDe.
          Hide
          Jeremy Beard added a comment -

          Would that mean with this patch we still need to specify the SerDe when creating an Avro table?

          Show
          Jeremy Beard added a comment - Would that mean with this patch we still need to specify the SerDe when creating an Avro table?
          Hide
          Brock Noland added a comment -

          That change sounds good to me.

          Jeremey, no I believe this is a metadata change only.

          Show
          Brock Noland added a comment - That change sounds good to me. Jeremey, no I believe this is a metadata change only.
          Hide
          Ashish K Singh added a comment -

          Updated patch after rebase.

          Show
          Ashish K Singh added a comment - Updated patch after rebase.
          Hide
          David Chen added a comment -

          Thanks, Ashish. I saw that you have a test for partitioned tables. Can you also include one that covers schema evolution, i.e. when the schema changes over partitions, such as the case in HIVE-6835?

          Show
          David Chen added a comment - Thanks, Ashish. I saw that you have a test for partitioned tables. Can you also include one that covers schema evolution, i.e. when the schema changes over partitions, such as the case in HIVE-6835 ?
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12655626/HIVE-6806.1.patch

          ERROR: -1 due to 3 failed/errored test(s), 5749 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_in_file
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/783/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/783/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-783/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12655626

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655626/HIVE-6806.1.patch ERROR: -1 due to 3 failed/errored test(s), 5749 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_in_file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/783/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/783/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-783/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12655626
          Hide
          Ashish K Singh added a comment -

          Add qtest for avro schema evolution.

          Show
          Ashish K Singh added a comment - Add qtest for avro schema evolution.
          Hide
          Ashish K Singh added a comment -

          [~davidchen] Addressed your concerns on RB and added qtest with avro schema evolution scenario. Kindly take a look.

          Show
          Ashish K Singh added a comment - [~davidchen] Addressed your concerns on RB and added qtest with avro schema evolution scenario. Kindly take a look.
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12655976/HIVE-6806.2.patch

          ERROR: -1 due to 5 failed/errored test(s), 5752 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/807/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/807/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-807/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 5 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12655976

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12655976/HIVE-6806.2.patch ERROR: -1 due to 5 failed/errored test(s), 5752 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_temp_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/807/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/807/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-807/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12655976
          Hide
          David Chen added a comment -

          Thanks for adding the qtest for schema evolution, Ashish!

          Aside from some minor formatting comments, this looks good to me.

          Show
          David Chen added a comment - Thanks for adding the qtest for schema evolution, Ashish! Aside from some minor formatting comments, this looks good to me.
          Hide
          Carl Steinbach added a comment -

          Tom White, Lars Francke, Brock Noland: Are you guys satisfied with the current version of the patch? If so I'll plan to +1 it and get it committed after another round of automated tests. Thanks.

          Show
          Carl Steinbach added a comment - Tom White , Lars Francke , Brock Noland : Are you guys satisfied with the current version of the patch? If so I'll plan to +1 it and get it committed after another round of automated tests. Thanks.
          Hide
          Ashish K Singh added a comment -

          All reviews on RB are addressed. Once I get a ship it there, I can post the patch here for automated tests.

          Show
          Ashish K Singh added a comment - All reviews on RB are addressed. Once I get a ship it there, I can post the patch here for automated tests.
          Hide
          Ashish K Singh added a comment -

          Addressed review comments on RB.

          Show
          Ashish K Singh added a comment - Addressed review comments on RB.
          Hide
          Lars Francke added a comment -

          Patch looks good. Ashish, thanks a lot for addressing all these minor comments. Ship it! (not a committer)

          Show
          Lars Francke added a comment - Patch looks good. Ashish, thanks a lot for addressing all these minor comments. Ship it! (not a committer)
          Hide
          David Chen added a comment -

          Looks good to me as well. +1 (also not a committer)

          Show
          David Chen added a comment - Looks good to me as well. +1 (also not a committer)
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12657411/HIVE-6806.3.patch

          ERROR: -1 due to 4 failed/errored test(s), 5761 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
          org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers
          org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/28/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/28/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-28/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 4 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12657411

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12657411/HIVE-6806.3.patch ERROR: -1 due to 4 failed/errored test(s), 5761 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/28/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/28/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-28/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed This message is automatically generated. ATTACHMENT ID: 12657411
          Hide
          Brock Noland added a comment -

          +1

          this is an awesome contribution...

          Show
          Brock Noland added a comment - +1 this is an awesome contribution...
          Hide
          David Chen added a comment -

          I have verified that the test failures do not appear to be caused by this patch.

          • The TestMinimrCliDriver failure also fails on trunk
          • The TestOrcHCatLoader failure is a known issue and also fails on trunk spuriously.
          • I was not able to reproduce the TestCompactionTxnHandler or TestHiveServer2 failures on both trunk and this patch applied on trunk.

          I think this patch is ready to be committed.

          Show
          David Chen added a comment - I have verified that the test failures do not appear to be caused by this patch. The TestMinimrCliDriver failure also fails on trunk The TestOrcHCatLoader failure is a known issue and also fails on trunk spuriously. I was not able to reproduce the TestCompactionTxnHandler or TestHiveServer2 failures on both trunk and this patch applied on trunk. I think this patch is ready to be committed.
          Hide
          Ashish K Singh added a comment -

          Yes, the failed tests are not related to this patch.

          Show
          Ashish K Singh added a comment - Yes, the failed tests are not related to this patch.
          Hide
          Carl Steinbach added a comment -

          Committed to trunk. Thanks Ashish!

          Show
          Carl Steinbach added a comment - Committed to trunk. Thanks Ashish!
          Hide
          Ashish K Singh added a comment -

          Thanks all for reviewing and committing.

          Show
          Ashish K Singh added a comment - Thanks all for reviewing and committing.
          Hide
          Lefty Leverenz added a comment -

          This will need documentation when 0.14.0 gets released (with version note and link to this JIRA ticket):

          Show
          Lefty Leverenz added a comment - This will need documentation when 0.14.0 gets released (with version note and link to this JIRA ticket): Avro SerDe (multiple sections) DDL – Create Table (syntax: list of values for file_format) DDL – Row Format, Storage Format, and SerDe
          Hide
          Brock Noland added a comment -

          Great to see this committed. I cannot say enough how awesome this change is!

          Show
          Brock Noland added a comment - Great to see this committed. I cannot say enough how awesome this change is!
          Hide
          Ashish K Singh added a comment -

          Lefty Leverenz thanks for looking at this. I am not sure how documentations are handled. Could you please help me understand what needs to be done for the documentation here.

          Brock Noland thanks!

          Show
          Ashish K Singh added a comment - Lefty Leverenz thanks for looking at this. I am not sure how documentations are handled. Could you please help me understand what needs to be done for the documentation here. Brock Noland thanks!
          Hide
          Lefty Leverenz added a comment -

          Ashish K Singh, wikidoc updates are handled various ways. Sometimes the developer takes care of it, sometimes I write it up and ask for review, and occasionally someone else writes it up. I usually edit what others write, making sure version information is included and cross references get made.

          • In this case, I think I could revise the DDL sections adequately but you would probably do a better job revising the Avro SerDe wiki, keeping in mind that the old information needs to remain for users of previous releases.
          • If you don't want to do it yourself, or even if you do, a release note on this JIRA ticket would be an excellent start.
          • If you have an in-house tech writer who could do the job, their contribution would be most welcome — my backlog of doc tasks is daunting, I'm just doing this as a retirement hobby, and it's summertime.

          The links in my previous comment show where revisions should go. The Avro SerDe doc has several examples of CREATE TABLE — the simplest approach is to add a second example for each one, saying "In Hive 0.14.0 and later, this syntax can be used:" (or something similar). A general statement or discussion of the new syntax would be good too.

          Since the 0.14.0 release is a few months away, the documentation could wait although it might be best done while still fresh in your mind. To gain write access to the wiki, follow the instructions here:

          Show
          Lefty Leverenz added a comment - Ashish K Singh , wikidoc updates are handled various ways. Sometimes the developer takes care of it, sometimes I write it up and ask for review, and occasionally someone else writes it up. I usually edit what others write, making sure version information is included and cross references get made. In this case, I think I could revise the DDL sections adequately but you would probably do a better job revising the Avro SerDe wiki, keeping in mind that the old information needs to remain for users of previous releases. If you don't want to do it yourself, or even if you do, a release note on this JIRA ticket would be an excellent start. If you have an in-house tech writer who could do the job, their contribution would be most welcome — my backlog of doc tasks is daunting, I'm just doing this as a retirement hobby, and it's summertime. The links in my previous comment show where revisions should go. The Avro SerDe doc has several examples of CREATE TABLE — the simplest approach is to add a second example for each one, saying "In Hive 0.14.0 and later, this syntax can be used:" (or something similar). A general statement or discussion of the new syntax would be good too. Since the 0.14.0 release is a few months away, the documentation could wait although it might be best done while still fresh in your mind. To gain write access to the wiki, follow the instructions here: About This Wiki
          Hide
          Ashish K Singh added a comment -

          Lefty Leverenz Thanks for detailed info here.

          I have updated the JIRA's release note and documentation on Avro's usage in Hive, https://cwiki.apache.org/confluence/display/Hive/AvroSerDe. Feel free to make it better.

          Show
          Ashish K Singh added a comment - Lefty Leverenz Thanks for detailed info here. I have updated the JIRA's release note and documentation on Avro's usage in Hive, https://cwiki.apache.org/confluence/display/Hive/AvroSerDe . Feel free to make it better.
          Hide
          Lefty Leverenz added a comment -

          Thanks Ashish, your doc changes look good. I'm just making a few minor edits.

          This sentence in the Avro SerDe doc is out of date: "The AvroSerde has been built and tested against Hive 0.9.1 and Avro 1.5."

          1. Can I change it to "tested against Hive 0.9.1 and later"?
          2. What Avro versions have been tested? (Their latest is 1.7.7: http://avro.apache.org/releases.html.)
          Show
          Lefty Leverenz added a comment - Thanks Ashish, your doc changes look good. I'm just making a few minor edits. This sentence in the Avro SerDe doc is out of date: "The AvroSerde has been built and tested against Hive 0.9.1 and Avro 1.5." Can I change it to "tested against Hive 0.9.1 and later"? What Avro versions have been tested? (Their latest is 1.7.7: http://avro.apache.org/releases.html .)
          Hide
          Lefty Leverenz added a comment -

          Ashish K Singh, why did you outdent union1 to bytes1 in the examples? I aligned them with the rest of the data types, then indented all of them two more spaces to make STORED AS AVRO stand out – but if you wanted the outdent, please revert my changes or ask me to do it.

          Also, your example in "Hive 0.14 and later versions" under "Creating Avro-backed Hive tables" is identical to the one you added to the code block in "All Hive versions" just before it – was that deliberate, or an editing artifact? It seems to me the Hive 0.14 example in "All Hive versions" isn't necessary, but I left it in for now.

          Please review my changes, because I moved some information around.

          Show
          Lefty Leverenz added a comment - Ashish K Singh , why did you outdent union1 to bytes1 in the examples? I aligned them with the rest of the data types, then indented all of them two more spaces to make STORED AS AVRO stand out – but if you wanted the outdent, please revert my changes or ask me to do it. Also, your example in "Hive 0.14 and later versions" under "Creating Avro-backed Hive tables" is identical to the one you added to the code block in "All Hive versions" just before it – was that deliberate, or an editing artifact? It seems to me the Hive 0.14 example in "All Hive versions" isn't necessary, but I left it in for now. Please review my changes, because I moved some information around. Avro SerDe
          Hide
          Lefty Leverenz added a comment -

          I added STORED AS AVRO to the DDL wikidoc:

          Show
          Lefty Leverenz added a comment - I added STORED AS AVRO to the DDL wikidoc: Create Table – see file_format (at the end of the syntax) file_format: : SEQUENCEFILE ... | AVRO (Note: Only available starting with Hive 0.14.0) Row Format, Storage Format, and SerDe "Use STORED AS AVRO for Avro files in Hive 0.14.0 and later (see Avro SerDe)."
          Hide
          Ashish K Singh added a comment -

          Lefty Leverenz thanks for taking a look at the changes I made. Based on your feedback I have made following changes.

          • As Hive 0.14 uses Avro 1.7.5, updated Hive and Avro version text accordingly.
          • Removed the Hive 0.14 example in "All Hive versions" under "Creating Avro-backed Hive tables".

          Let me know if I missed out anything.

          Show
          Ashish K Singh added a comment - Lefty Leverenz thanks for taking a look at the changes I made. Based on your feedback I have made following changes. As Hive 0.14 uses Avro 1.7.5, updated Hive and Avro version text accordingly. Removed the Hive 0.14 example in "All Hive versions" under "Creating Avro-backed Hive tables". Let me know if I missed out anything.
          Hide
          Lefty Leverenz added a comment -

          Looks good, thanks Ashish K Singh.

          But (ever the nitpicker) we probably need Hive versions matched with Avro versions, because the wiki covers all Hive versions. Does Hive 0.13 also use Avro 1.7.5? If the Avro version is somewhere in the code, I can compile a list of Avro versions for various Hive versions.

          Of course, this is beyond the scope of the jira. So if it's more than a moment's work, we should raise a new issue.

          Show
          Lefty Leverenz added a comment - Looks good, thanks Ashish K Singh . But (ever the nitpicker) we probably need Hive versions matched with Avro versions, because the wiki covers all Hive versions. Does Hive 0.13 also use Avro 1.7.5? If the Avro version is somewhere in the code, I can compile a list of Avro versions for various Hive versions. Of course, this is beyond the scope of the jira. So if it's more than a moment's work, we should raise a new issue.
          Hide
          Lefty Leverenz added a comment - - edited

          (I should grep before asking, not after.) Here's what I've found in the released branches, starting with 0.9:

          Hive 0.9 – ivy/libraries.properties:avro.version=1.5.3
          Hive 0.10 – ivy/libraries.properties:avro.version=1.7.1
          Hive 0.11 – ivy/libraries.properties:avro.version=1.7.1
          Hive 0.12 – ivy/libraries.properties:avro.version=1.7.1
          Hive 0.13 – pom.xml: <avro.version>1.7.5</avro.version>
          Hive 0.13.1 – pom.xml: <avro.version>1.7.5</avro.version>

          Documented here:

          Show
          Lefty Leverenz added a comment - - edited (I should grep before asking, not after.) Here's what I've found in the released branches, starting with 0.9: Hive 0.9 – ivy/libraries.properties:avro.version=1.5.3 Hive 0.10 – ivy/libraries.properties:avro.version=1.7.1 Hive 0.11 – ivy/libraries.properties:avro.version=1.7.1 Hive 0.12 – ivy/libraries.properties:avro.version=1.7.1 Hive 0.13 – pom.xml: <avro.version>1.7.5</avro.version> Hive 0.13.1 – pom.xml: <avro.version>1.7.5</avro.version> Documented here: Avro SerDe – Requirements
          Hide
          Lefty Leverenz added a comment -

          Shouldn't AVRO be added as a possible value for the hive.default.fileformat parameter in HiveConf.java?

          Show
          Lefty Leverenz added a comment - Shouldn't AVRO be added as a possible value for the hive.default.fileformat parameter in HiveConf.java? Configuration Properties – hive.default.fileformat
          Hide
          Navis added a comment -

          Lefty Leverenz Right. I'll book that into new issue.

          Show
          Navis added a comment - Lefty Leverenz Right. I'll book that into new issue.
          Hide
          Lefty Leverenz added a comment -

          Navis created HIVE-8591 "hive.default.fileformat should accept all formats described by StorageFormatDescriptor" which also deals with the values for hive.query.result.fileformat – thanks!

          Show
          Lefty Leverenz added a comment - Navis created HIVE-8591 "hive.default.fileformat should accept all formats described by StorageFormatDescriptor" which also deals with the values for hive.query.result.fileformat – thanks!
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.

            People

            • Assignee:
              Ashish K Singh
              Reporter:
              Jeremy Beard
            • Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development