Hive
  1. Hive
  2. HIVE-7446

Add support to ALTER TABLE .. ADD COLUMN to Avro backed tables

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.14.0
    • Component/s: None
    • Labels:
      None
    • Release Note:
      Support to add columns to an Avro backed Hive table using Alter Table statement.

      Description

      HIVE-6806 adds native support for creating hive table stored as Avro. It would be good to add support to ALTER TABLE .. ADD COLUMN to Avro backed tables.

      1. HIVE-7446.1.patch
        18 kB
        Ashish K Singh
      2. HIVE-7446.patch
        7 kB
        Ashish K Singh

        Issue Links

          Activity

          Ashish K Singh created issue -
          Hide
          Ashish K Singh added a comment -

          Apparently, with the changes in HIVE-6806, alter table.. add column is also working for avro backed tables. Adding patch with a couple of qtests to verify the same.

          Show
          Ashish K Singh added a comment - Apparently, with the changes in HIVE-6806 , alter table.. add column is also working for avro backed tables. Adding patch with a couple of qtests to verify the same.
          Hide
          Ashish K Singh added a comment -
          Show
          Ashish K Singh added a comment - RB: https://reviews.apache.org/r/24085/
          Ashish K Singh made changes -
          Field Original Value New Value
          Attachment HIVE-7446.patch [ 12658576 ]
          Ashish K Singh made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12658576/HIVE-7446.patch

          ERROR: -1 due to 3 failed/errored test(s), 5840 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-104/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 3 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12658576

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12658576/HIVE-7446.patch ERROR: -1 due to 3 failed/errored test(s), 5840 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/104/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-104/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed This message is automatically generated. ATTACHMENT ID: 12658576
          Hide
          Ashish K Singh added a comment -

          Test errors are not related to this patch.

          Tom White could you take a look at this trivial patch.

          Show
          Ashish K Singh added a comment - Test errors are not related to this patch. Tom White could you take a look at this trivial patch.
          Hide
          Ashish K Singh added a comment -

          Szehon Ho would you have some time to look at this?

          Show
          Ashish K Singh added a comment - Szehon Ho would you have some time to look at this?
          Hide
          Ashish K Singh added a comment -

          Thanks Tom White for reviewing the patch!

          Show
          Ashish K Singh added a comment - Thanks Tom White for reviewing the patch!
          Hide
          Szehon Ho added a comment -

          This test looks fine, I'm ok to commit it, but a more common use case is alter table with existing data, load more data, then query results. In RCFile , data is corrupted. Can you see if it works for avro?

          Show
          Szehon Ho added a comment - This test looks fine, I'm ok to commit it, but a more common use case is alter table with existing data, load more data, then query results. In RCFile , data is corrupted. Can you see if it works for avro?
          Hide
          Ashish K Singh added a comment -

          Thats a good suggestion Szehon Ho. I could not understand what you meant by RCFile. Do you mean there is a RC file that I can have a test around or something else?

          Show
          Ashish K Singh added a comment - Thats a good suggestion Szehon Ho . I could not understand what you meant by RCFile. Do you mean there is a RC file that I can have a test around or something else?
          Hide
          Szehon Ho added a comment -

          You can look at a discussion in HIVE-6131 for RCFile, it would be great if the same issue does not exist with Avro (it may have been fixed in HIVE-6835, but good to test it)

          Show
          Szehon Ho added a comment - You can look at a discussion in HIVE-6131 for RCFile, it would be great if the same issue does not exist with Avro (it may have been fixed in HIVE-6835 , but good to test it)
          Hide
          Ashish K Singh added a comment -

          Szehon Ho Do you something like this will be a good test?

          <code>
          CREATE TABLE doctors (
          number int,
          first_name string,
          last_name string)
          STORED AS AVRO;

          LOAD DATA LOCAL INPATH '../../data/files/doctors.avro' INTO TABLE doctors;

          CREATE TABLE doctors_copy (
          number int,
          first_name string)
          STORED AS AVRO;

          INSERT INTO TABLE doctors_copy SELECT number, first_name FROM doctors;

          ALTER TABLE doctors_copy ADD COLUMNS (last_name string);

          LOAD DATA LOCAL INPATH '../../data/files/doctors.avro' OVERWRITE INTO TABLE doctors_copy;

          SELECT * FROM doctors_copy;
          <code>

          Show
          Ashish K Singh added a comment - Szehon Ho Do you something like this will be a good test? <code> CREATE TABLE doctors ( number int, first_name string, last_name string) STORED AS AVRO; LOAD DATA LOCAL INPATH '../../data/files/doctors.avro' INTO TABLE doctors; CREATE TABLE doctors_copy ( number int, first_name string) STORED AS AVRO; INSERT INTO TABLE doctors_copy SELECT number, first_name FROM doctors; ALTER TABLE doctors_copy ADD COLUMNS (last_name string); LOAD DATA LOCAL INPATH '../../data/files/doctors.avro' OVERWRITE INTO TABLE doctors_copy; SELECT * FROM doctors_copy; <code>
          Hide
          Szehon Ho added a comment -

          I think the value of 'alter table' scenario is not to override the first set of data, as in your last statement, but instead the user wants to use both set of data.

          Otherwise this example (and the one in your test case) can also be achieved by just dropping doctor table and recreating.

          create table doctors (number int, first_name string);
          load data inpath '.../doctors.avro' into table doctors;
          alter table doctors add columns (last_name string);
          load data inpath '../doctors_with_last_names' into table doctors;
          select * from doctors;  --should return both sets.
          
          Show
          Szehon Ho added a comment - I think the value of 'alter table' scenario is not to override the first set of data, as in your last statement, but instead the user wants to use both set of data. Otherwise this example (and the one in your test case) can also be achieved by just dropping doctor table and recreating. create table doctors (number int, first_name string); load data inpath '.../doctors.avro' into table doctors; alter table doctors add columns (last_name string); load data inpath '../doctors_with_last_names' into table doctors; select * from doctors; --should return both sets.
          Hide
          Ashish K Singh added a comment -

          Szehon Ho thanks for your valuable insight here. This helped me discover that as of now alter table is actually not working for avro backed tables in Hive. Alter table updates HMS with new schema, but schema in the avro file containing avro-backed table's data keeps the original schema. On trying to read from the avro-backed table after altering table, avro throws exception while trying to read the avro file. This exception is thrown because of difference in expected and actual schemas.

          Based on offline discussion with Tom White, Avro allows files written with the old schema to be read with the new schema as long as certain rules are followed, e.g. adding a new field has a default value. The full set of rules are at http://avro.apache.org/docs/current/spec.html#Schema+Resolution.

          I will upload a patch that should fix this with appropriate tests.

          Show
          Ashish K Singh added a comment - Szehon Ho thanks for your valuable insight here. This helped me discover that as of now alter table is actually not working for avro backed tables in Hive. Alter table updates HMS with new schema, but schema in the avro file containing avro-backed table's data keeps the original schema. On trying to read from the avro-backed table after altering table, avro throws exception while trying to read the avro file. This exception is thrown because of difference in expected and actual schemas. Based on offline discussion with Tom White , Avro allows files written with the old schema to be read with the new schema as long as certain rules are followed, e.g. adding a new field has a default value. The full set of rules are at http://avro.apache.org/docs/current/spec.html#Schema+Resolution . I will upload a patch that should fix this with appropriate tests.
          Hide
          Ashish K Singh added a comment -

          Updated RB. Kindly take a look.

          Show
          Ashish K Singh added a comment - Updated RB. Kindly take a look.
          Hide
          Szehon Ho added a comment -

          Glad we caught that, +1

          Show
          Szehon Ho added a comment - Glad we caught that, +1
          Hide
          Ashish K Singh added a comment -

          Address review comments, add tests and logic to handle added columns via alter table statement for avro-backed tables.

          Show
          Ashish K Singh added a comment - Address review comments, add tests and logic to handle added columns via alter table statement for avro-backed tables.
          Ashish K Singh made changes -
          Attachment HIVE-7446.1.patch [ 12660511 ]
          Hide
          Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12660511/HIVE-7446.1.patch

          ERROR: -1 due to 5 failed/errored test(s), 5889 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx
          org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode
          org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-229/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 5 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12660511

          Show
          Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12660511/HIVE-7446.1.patch ERROR: -1 due to 5 failed/errored test(s), 5889 tests executed Failed tests: org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.ql.TestDDLWithRemoteMetastoreSecondNamenode.testCreateTableWithIndexAndPartitionsNonDefaultNameNode org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/229/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-229/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed This message is automatically generated. ATTACHMENT ID: 12660511
          Hide
          Ashish K Singh added a comment -

          The test errors above are not related to this patch.

          Show
          Ashish K Singh added a comment - The test errors above are not related to this patch.
          Hide
          Szehon Ho added a comment -

          Committed to trunk. Thanks Ashish for the contribution.

          Show
          Szehon Ho added a comment - Committed to trunk. Thanks Ashish for the contribution.
          Szehon Ho made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.14.0 [ 12326450 ]
          Resolution Fixed [ 1 ]
          Lefty Leverenz made changes -
          Labels TODOC14
          Lefty Leverenz made changes -
          Link This issue relates to HIVE-6806 [ HIVE-6806 ]
          Hide
          Lefty Leverenz added a comment -

          Docs: This can be mentioned in a version box in the DDL's Add/ReplaceColumns section. It also belongs somewhere in the Avro SerDe doc. A release note on this JIRA would be nice too.

          Show
          Lefty Leverenz added a comment - Docs: This can be mentioned in a version box in the DDL's Add/ReplaceColumns section. It also belongs somewhere in the Avro SerDe doc. A release note on this JIRA would be nice too. DDL – Add/Replace Columns Avro SerDe
          Ashish K Singh made changes -
          Release Note Support to add columns to an Avro backed Hive table using Alter Table statement.
          Hide
          Ashish K Singh added a comment -

          Lefty Leverenz Thanks for taking a look at this. Added info to both, DDL – Add/Replace Columns and Avro SerDe.

          Show
          Ashish K Singh added a comment - Lefty Leverenz Thanks for taking a look at this. Added info to both, DDL – Add/Replace Columns and Avro SerDe.
          Hide
          Lefty Leverenz added a comment -

          Good release note and docs, thanks Ashish K Singh. I added links back to this jira.

          Show
          Lefty Leverenz added a comment - Good release note and docs, thanks Ashish K Singh . I added links back to this jira.
          Lefty Leverenz made changes -
          Labels TODOC14
          Hide
          Thejas M Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          Show
          Thejas M Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.
          Thejas M Nair made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          11d 9h 14m 1 Ashish K Singh 30/Jul/14 03:36
          Patch Available Patch Available Resolved Resolved
          14d 19h 19m 1 Szehon Ho 13/Aug/14 22:56
          Resolved Resolved Closed Closed
          91d 21h 46m 1 Thejas M Nair 13/Nov/14 19:42

            People

            • Assignee:
              Ashish K Singh
              Reporter:
              Ashish K Singh
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development