Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 0.14.0
    • None

    Description

      Currently doing alter table set fileformat doesn't change the serde. This is unexpected by customers because the serdes are largely file format specific.

      Attachments

        1. HIVE-6756.1.patch
          25 kB
          Vasanth kumar RJ
        2. HIVE-6756.2.patch
          33 kB
          Chinna Rao Lalam
        3. HIVE-6756.3.patch
          34 kB
          Chinna Rao Lalam
        4. HIVE-6756.patch
          25 kB
          Chinna Rao Lalam

        Activity

          In alert table file format for the ORC and RC file formats are setting the corresponding serdes, reaming file formats are not setting the corresponding serde.

          In create table if we are not specifying the serde other than ORC and RC file formats it is setting with LazySimpleSerDe, like create table in alert table set file format added this.

          chinnalalam Chinna Rao Lalam added a comment - In alert table file format for the ORC and RC file formats are setting the corresponding serdes, reaming file formats are not setting the corresponding serde. In create table if we are not specifying the serde other than ORC and RC file formats it is setting with LazySimpleSerDe, like create table in alert table set file format added this.

          I think instead of always defaulting to LazySimpleSerde, better is to set LazySimpleSerde for Textfile and SequenceFile format only and throw exception in cases where serde is not specified. We cant assume other file format uses LazySimpleSerde.

          ashutoshc Ashutosh Chauhan added a comment - I think instead of always defaulting to LazySimpleSerde, better is to set LazySimpleSerde for Textfile and SequenceFile format only and throw exception in cases where serde is not specified. We cant assume other file format uses LazySimpleSerde.

          With out the patch, current code is taken care for the RC,ORC and PARQUET file formats (ALTER TATBLE SET FILEFORMT configuring the proper serde for RC,ORC and PARQUET file formats)

          TEXTFILE, SEQUENCE file formats are not handled. This patch will address by configuring LazySimpleSerde for these file formats.

          Apart from this in ALTER TATBLE SET FILEFORMT can use INPUTFORMAT,OUTPUTFORMAT classes. In this scenario not sure which serde need to be configure?

          If throws exception he cannot use INPUTFORMAT,OUTPUTFORMAT classes in ALTER TATBLE SET FILEFORMT.

          Any suggestions..

          chinnalalam Chinna Rao Lalam added a comment - With out the patch, current code is taken care for the RC,ORC and PARQUET file formats (ALTER TATBLE SET FILEFORMT configuring the proper serde for RC,ORC and PARQUET file formats) TEXTFILE, SEQUENCE file formats are not handled. This patch will address by configuring LazySimpleSerde for these file formats. Apart from this in ALTER TATBLE SET FILEFORMT can use INPUTFORMAT,OUTPUTFORMAT classes. In this scenario not sure which serde need to be configure? If throws exception he cannot use INPUTFORMAT,OUTPUTFORMAT classes in ALTER TATBLE SET FILEFORMT. Any suggestions..

          Sorry Chinna Rao Lalam for being late on this. My suggestion is:
          1. Instead of setting LazySimpleSerde as default for all cases, set LazySimpleSerde only for TEXTFILE & SEQUENCE (similiar to RC, ORC etc.)
          2. For Alter table set fileformat IF OF case, throw exception if user doesnt specify serde. ie if user is specifying IF & OF she must also specify serde.

          ashutoshc Ashutosh Chauhan added a comment - Sorry Chinna Rao Lalam for being late on this. My suggestion is: 1. Instead of setting LazySimpleSerde as default for all cases, set LazySimpleSerde only for TEXTFILE & SEQUENCE (similiar to RC, ORC etc.) 2. For Alter table set fileformat IF OF case, throw exception if user doesnt specify serde. ie if user is specifying IF & OF she must also specify serde.

          Hi Ashutosh Chauhan,
          Implemented as per your suggestion.
          Sorry Chinna Rao Lalam for taking over this JIRA.

          Kindly verify.

          Thanks,
          Vasanth kumar

          vasanthkumar Vasanth kumar RJ added a comment - Hi Ashutosh Chauhan , Implemented as per your suggestion. Sorry Chinna Rao Lalam for taking over this JIRA. Kindly verify. Thanks, Vasanth kumar
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12645402/HIVE-6756.1.patch

          ERROR: -1 due to 31 failed/errored test(s), 5526 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapreduce6
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_serde
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
          org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr2
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.common.metrics.TestMetrics.testScopeConcurrency
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
          org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
          org.apache.hive.hcatalog.cli.TestUseDatabase.testAlterTablePass
          org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus
          org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/224/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/224/console

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 31 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12645402

          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12645402/HIVE-6756.1.patch ERROR: -1 due to 31 failed/errored test(s), 5526 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapreduce6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.common.metrics.TestMetrics.testScopeConcurrency org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hive.hcatalog.cli.TestUseDatabase.testAlterTablePass org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/224/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/224/console Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 31 tests failed This message is automatically generated. ATTACHMENT ID: 12645402

          While some of the failures in Hive QA run are also on trunk, but some other failures look relevant.

          ashutoshc Ashutosh Chauhan added a comment - While some of the failures in Hive QA run are also on trunk, but some other failures look relevant.

          Hi Ashutosh, I am working on this. I will upload patch today.

          chinnalalam Chinna Rao Lalam added a comment - Hi Ashutosh, I am working on this. I will upload patch today.

          Added positive and negative testcases for these scenarios.
          Corrected parquet_serde.q this test case and remaining testcases are not related to this change.

          chinnalalam Chinna Rao Lalam added a comment - Added positive and negative testcases for these scenarios. Corrected parquet_serde.q this test case and remaining testcases are not related to this change.

          Patch looks good, accept for textfile & seqfile
          + serde = conf.getVar(HiveConf.ConfVars.HIVESCRIPTSERDE);

          scriptserde config is used for other purposes, I think its better just to do
          serde = LazySimpleSerDe.getClass().getName()
          since thats equivalent behavior with create table stored as textfile / sequencefile

          Looks good otherwise

          ashutoshc Ashutosh Chauhan added a comment - Patch looks good, accept for textfile & seqfile + serde = conf.getVar(HiveConf.ConfVars.HIVESCRIPTSERDE); scriptserde config is used for other purposes, I think its better just to do serde = LazySimpleSerDe.getClass().getName() since thats equivalent behavior with create table stored as textfile / sequencefile Looks good otherwise

          Hi Ashutosh, Thanks for reviewing the patch. Reworked the patch.

          chinnalalam Chinna Rao Lalam added a comment - Hi Ashutosh, Thanks for reviewing the patch. Reworked the patch.

          +1

          ashutoshc Ashutosh Chauhan added a comment - +1
          hiveqa Hive QA added a comment -

          Overall: -1 at least one tests failed

          Here are the results of testing the latest attachment:
          https://issues.apache.org/jira/secure/attachment/12646708/HIVE-6756.3.patch

          ERROR: -1 due to 12 failed/errored test(s), 5465 tests executed
          Failed tests:

          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2
          org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3
          org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
          org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
          org.apache.hive.hcatalog.cli.TestUseDatabase.testAlterTablePass
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
          org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
          org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
          

          Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/298/testReport
          Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/298/console
          Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-298/

          Messages:

          Executing org.apache.hive.ptest.execution.PrepPhase
          Executing org.apache.hive.ptest.execution.ExecutionPhase
          Executing org.apache.hive.ptest.execution.ReportingPhase
          Tests exited with: TestsFailedException: 12 tests failed
          

          This message is automatically generated.

          ATTACHMENT ID: 12646708

          hiveqa Hive QA added a comment - Overall : -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12646708/HIVE-6756.3.patch ERROR: -1 due to 12 failed/errored test(s), 5465 tests executed Failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.cli.TestUseDatabase.testAlterTablePass org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/298/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/298/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-298/ Messages: Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed This message is automatically generated. ATTACHMENT ID: 12646708

          Committed to trunk. Thanks, Chinna!

          ashutoshc Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Chinna!
          szehon Szehon Ho added a comment -

          Ashutosh Chauhan the commit message has a typo and says HIVE-3756 and might cause confusion grepping git log, is it too late to fix?

          szehon Szehon Ho added a comment - Ashutosh Chauhan the commit message has a typo and says HIVE-3756 and might cause confusion grepping git log, is it too late to fix?

          Thanks for catching that. Updated svn commit message. Its correctly reflected in svn log now. However, I think svn-git bridge doesnt recognize commit message edits, so in git repo it will still show old one.

          ashutoshc Ashutosh Chauhan added a comment - Thanks for catching that. Updated svn commit message. Its correctly reflected in svn log now. However, I think svn-git bridge doesnt recognize commit message edits, so in git repo it will still show old one.
          leftyl Lefty Leverenz added a comment -

          Does this need any user doc?

          leftyl Lefty Leverenz added a comment - Does this need any user doc?

          Hi Lefty Leverenz,

          Need to update this in Alter Table/Partition File Format section:

          Alter table set fileformat with INPUTFORMAT & OUTPUTFORMAT case, throw exception if user doesn't specify serde. ie if user is specifying INPUTFORMAT & OUTPUTFORMAT she must also specify serde.

          chinnalalam Chinna Rao Lalam added a comment - Hi Lefty Leverenz, Need to update this in Alter Table/Partition File Format section: Alter table set fileformat with INPUTFORMAT & OUTPUTFORMAT case, throw exception if user doesn't specify serde. ie if user is specifying INPUTFORMAT & OUTPUTFORMAT she must also specify serde.
          thejas Thejas Nair added a comment -

          This has been fixed in 0.14 release. Please open new jira if you see any issues.

          thejas Thejas Nair added a comment - This has been fixed in 0.14 release. Please open new jira if you see any issues.

          People

            chinnalalam Chinna Rao Lalam Assign to me
            omalley Owen O'Malley
            Votes:
            0 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack