Description
Currently doing alter table set fileformat doesn't change the serde. This is unexpected by customers because the serdes are largely file format specific.
Attachments
Attachments
- HIVE-6756.1.patch
- 25 kB
- Vasanth kumar RJ
- HIVE-6756.2.patch
- 33 kB
- Chinna Rao Lalam
- HIVE-6756.3.patch
- 34 kB
- Chinna Rao Lalam
- HIVE-6756.patch
- 25 kB
- Chinna Rao Lalam
Activity
I think instead of always defaulting to LazySimpleSerde, better is to set LazySimpleSerde for Textfile and SequenceFile format only and throw exception in cases where serde is not specified. We cant assume other file format uses LazySimpleSerde.
With out the patch, current code is taken care for the RC,ORC and PARQUET file formats (ALTER TATBLE SET FILEFORMT configuring the proper serde for RC,ORC and PARQUET file formats)
TEXTFILE, SEQUENCE file formats are not handled. This patch will address by configuring LazySimpleSerde for these file formats.
Apart from this in ALTER TATBLE SET FILEFORMT can use INPUTFORMAT,OUTPUTFORMAT classes. In this scenario not sure which serde need to be configure?
If throws exception he cannot use INPUTFORMAT,OUTPUTFORMAT classes in ALTER TATBLE SET FILEFORMT.
Any suggestions..
Sorry Chinna Rao Lalam for being late on this. My suggestion is:
1. Instead of setting LazySimpleSerde as default for all cases, set LazySimpleSerde only for TEXTFILE & SEQUENCE (similiar to RC, ORC etc.)
2. For Alter table set fileformat IF OF case, throw exception if user doesnt specify serde. ie if user is specifying IF & OF she must also specify serde.
Hi Ashutosh Chauhan,
Implemented as per your suggestion.
Sorry Chinna Rao Lalam for taking over this JIRA.
Kindly verify.
Thanks,
Vasanth kumar
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12645402/HIVE-6756.1.patch
ERROR: -1 due to 31 failed/errored test(s), 5526 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapreduce6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.common.metrics.TestMetrics.testScopeConcurrency org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hive.hcatalog.cli.TestUseDatabase.testAlterTablePass org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/224/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/224/console
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 31 tests failed
This message is automatically generated.
ATTACHMENT ID: 12645402
While some of the failures in Hive QA run are also on trunk, but some other failures look relevant.
Added positive and negative testcases for these scenarios.
Corrected parquet_serde.q this test case and remaining testcases are not related to this change.
Patch looks good, accept for textfile & seqfile
+ serde = conf.getVar(HiveConf.ConfVars.HIVESCRIPTSERDE);
scriptserde config is used for other purposes, I think its better just to do
serde = LazySimpleSerDe.getClass().getName()
since thats equivalent behavior with create table stored as textfile / sequencefile
Looks good otherwise
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12646708/HIVE-6756.3.patch
ERROR: -1 due to 12 failed/errored test(s), 5465 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_exim_04_evolved_parts org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat3 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.cli.TestUseDatabase.testAlterTablePass org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/298/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/298/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-298/
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed
This message is automatically generated.
ATTACHMENT ID: 12646708
Ashutosh Chauhan the commit message has a typo and says HIVE-3756 and might cause confusion grepping git log, is it too late to fix?
Thanks for catching that. Updated svn commit message. Its correctly reflected in svn log now. However, I think svn-git bridge doesnt recognize commit message edits, so in git repo it will still show old one.
Hi Lefty Leverenz,
Need to update this in Alter Table/Partition File Format section:
Alter table set fileformat with INPUTFORMAT & OUTPUTFORMAT case, throw exception if user doesn't specify serde. ie if user is specifying INPUTFORMAT & OUTPUTFORMAT she must also specify serde.
This has been fixed in 0.14 release. Please open new jira if you see any issues.
In alert table file format for the ORC and RC file formats are setting the corresponding serdes, reaming file formats are not setting the corresponding serde.
In create table if we are not specifying the serde other than ORC and RC file formats it is setting with LazySimpleSerDe, like create table in alert table set file format added this.