Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
HideAllows direct creation of HFiles and location for them as part of HBaseStorageHandler write if the following properties are specified in the HQL:
set hive.hbase.generatehfiles=true;
set hfile.family.path=/tmp/columnfamily_name;
hfile.family.path can also be set as a table property, HQL value takes precedence.ShowAllows direct creation of HFiles and location for them as part of HBaseStorageHandler write if the following properties are specified in the HQL: set hive.hbase.generatehfiles=true; set hfile.family.path=/tmp/columnfamily_name; hfile.family.path can also be set as a table property, HQL value takes precedence.
Description
Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler.
Attachments
Attachments
- HIVE-6473.0.patch.txt
- 18 kB
- Nick Dimiduk
- HIVE-6473.1.patch
- 25 kB
- Nick Dimiduk
- HIVE-6473.1.patch.txt
- 25 kB
- Nick Dimiduk
- HIVE-6473.2.patch
- 31 kB
- Nick Dimiduk
- HIVE-6473.3.patch
- 31 kB
- Brock Noland
- HIVE-6473.4.patch
- 25 kB
- Nick Dimiduk
- HIVE-6473.5.patch
- 26 kB
- Nick Dimiduk
- HIVE-6473.6.patch
- 14 kB
- Nick Dimiduk
Issue Links
Activity
When this is committed, it should be documented in the HBase Bulk Load design doc with a release note and link back to this JIRA.
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12630117/HIVE-6473.0.patch.txt
ERROR: -1 due to 4 failed/errored test(s), 5178 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_generatehfiles_require_family_path
Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1446/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1446/console
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed
This message is automatically generated.
ATTACHMENT ID: 12630117
Thanks for looking into this Nick. This should be very helpful. Would you mind opening a RB review for detailed review of the patch?
Thanks,
ndimiduk The patch looks pretty good to me. I left minor comments on the RB. Also do we know the reason for the test failures here? They seem related.
swarnim thanks for the notes, I'll have a look. I presume the test failures have to do with the test infrastructure not running an online HBase instance. Perhaps this is why hbase_bulk.m was named thusly? That's just a guess, I'm entirely ignorant of the buildbot's capabilities.
Attaching same file with different extension, see which one buildbot picks up.
Patch looks good to me. I'll try to kick off some tests on this myself.
One more thing though - you remove hbase-handler/src/test/queries/positive/hbase_bulk.m in this patch, but you do not remove the corresponding hbase-handler/src/test/results/positive/hbase_bulk.m.out file. Could you add that removal as well?
I'm +1 on it otherwise though, and will commit once we have a test run.
Thanks for having a look, sushanth; good catch. Here's an updated patch that takes care of hbase_bulk.m.out.
Overall: -1 no tests executed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12645054/HIVE-6473.2.patch
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/205/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/205/console
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase
This message is automatically generated.
ATTACHMENT ID: 12645054
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12645134/HIVE-6473.3.patch
ERROR: -1 due to 18 failed/errored test(s), 5452 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter org.apache.hadoop.hive.common.metrics.TestMetrics.testScopeConcurrency org.apache.hadoop.hive.metastore.TestRetryingHMSHandler.testRetryingHMSHandler org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4 org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5 org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/207/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/207/console
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed
This message is automatically generated.
ATTACHMENT ID: 12645134
Most of those tests seem unrelated, except for org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk
Nick, could you please look into that to see if the test needs updating?
sushanth This test was disabled before I came along; I'm not sure why. It passes for me locally and when I run with -Dtest.output.overwrite=true the output file is unchanged. Mind giving it a go yourself?
MAVEN_OPTS="-Xmx2g" mvn test -Dtest=TestHBaseCliDriver -Dqfile=hbase_snapshot.q -Dtest.output.overwrite=true -Phadoop-2,dist -Dhive.root.logger=DEBUG,console
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12646293/HIVE-6473.4.patch
ERROR: -1 due to 7 failed/errored test(s), 5459 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5 org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/268/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/268/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-268/
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed
This message is automatically generated.
ATTACHMENT ID: 12646293
Test still passes locally for me:
------------------------------------------------------------------------------- Test set: org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver ------------------------------------------------------------------------------- Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.33 sec - in org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver
Re-attaching patch for build bot.
ndimiduk I'll try running the test locally as well and then post back.
ndimiduk I ran the test locally with the latest patch and was able to reproduce the failure.
Failed tests: TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk:92->runTest:122 Unexpected exception Tests run: 1, Failures: 1, Errors: 0, Skipped: 0 [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:01.311s [INFO] Finished at: Wed Jun 04 01:26:30 CDT 2014 [INFO] Final Memory: 27M/87M [INFO] ------------------------------------------------------------------------
Also as a side note, it seems like the latest patch does not apply cleanly. May be it would be regnerated with "--no-prefix" option.
You might need to rebase the patch with master to reproduce the failure locally.
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648248/HIVE-6473.5.patch
ERROR: -1 due to 17 failed/errored test(s), 5588 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_metadata_only_queries org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_schema_evolution org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/386/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/386/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-386/
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 17 tests failed
This message is automatically generated.
ATTACHMENT ID: 12648248
Rebased onto trunk again. Removed enabling of hbase_bulk.m; it mostly passes but is flakey for me. Will address it in a follow-on ticket.
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12648784/HIVE-6473.6.patch
ERROR: -1 due to 10 failed/errored test(s), 5587 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing
Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/405/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/405/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-405/
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed
This message is automatically generated.
ATTACHMENT ID: 12648784
Removed enabling of hbase_bulk.m; it mostly passes but is flakey for me. Will address it in a follow-on ticket.
Opened HIVE-7197.
I'm okay with this final change, and thanks for opening a new ticket for a follow up. I'll try to verify that on my end too.
+1, will go ahead and commit.
Committed. Thanks, Nick!
Could you please check/edit the Release note for this jira for accuracy?
This has been fixed in 0.14 release. Please open new jira if you see any issues.
This patch introduces a new configuration flag hive.hbase.generatehfiles. When it is enabled, the Storage Handler will use HiveHFileOutputFormat for writing new records.
Note that all existing limitations for HFile generation from Hive remain, notably: