Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
Description
Found lots of following errors in HS2 log:
hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
Similar to HDFS-7931
Attachments
Attachments
- HIVE-16047.1.patch
- 0.8 kB
- Rui Li
- HIVE-16047.2.patch
- 2 kB
- Rui Li
Issue Links
- is related to
-
HIVE-16490 Hive should not use private HDFS APIs for encryption
- Closed
- relates to
-
HDFS-11687 Add new public encryption APIs required by Hive
- Resolved
Activity
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12854851/HIVE-16047.1.patch
ERROR: -1 due to no test(s) being added or modified.
ERROR: -1 due to 3 failed/errored test(s), 10266 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[skewjoinopt4] (batchId=106) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_count_distinct] (batchId=106)
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3809/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3809/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3809/
Messages:
Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed
This message is automatically generated.
ATTACHMENT ID: 12854851 - PreCommit-HIVE-Build
The change looks good to me. If compatibility is a concern, maybe we can use reflection to check if such a method is available before calling it, or check Hadoop version somehow. Thanks.
If compatibility is a concern, maybe we can use reflection to check if such a method is available before calling it, or check Hadoop version somehow
Agree.
Thanks guys for your suggestions.
Patch v2 uses reflection to check whether the method exists. If not, we'll do the check ourselves, in the same way as DFSClient.
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12855068/HIVE-16047.2.patch
ERROR: -1 due to no test(s) being added or modified.
ERROR: -1 due to 3 failed/errored test(s), 10274 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table] (batchId=147) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressParallel (batchId=211)
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3835/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3835/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3835/
Messages:
Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed
This message is automatically generated.
ATTACHMENT ID: 12855068 - PreCommit-HIVE-Build
Hi folks, DFSClient is a private API. After HADOOP-14333, it broke Hive compilation because we changed isHdfsEncryptionEnabled to throw an exception.
We'd be pleased to add public APIs at HDFS-11687 for whatever Hive needs wrt HDFS encryption (or even HDFS more broadly). I'd be nice to revert this for now (since it's a pretty minor fix) and revisit once HDFS-11687 is resolved. I already filed and linked HIVE-16490 to help track this work.
Andrew Wang thanks for the information. Is there any public API to achieve the same function?
Users have found the log quite confusing and annoying. And the "breaking" change is in unreleased Hadoop. Xuefu Zhang, Ferdinand Xu, Sergey Shelukhin do you think we should revert the patch? Or maybe we can revert in master and keep it in branch-2.x?
Honestly, I think the best thing to here is to revert. The fallback to config detection is also brittle, since the point of the Hadoop change that broke this was to query the KMS URI from the NameNode rather than config. Hive has limited ability here to fix what is ultimately an Hadoop-level problem.
We'll work to quash this log message in Hadoop, I don't know why it's logged at ERROR, or at all really. HDFS-7931 was an incomplete fix.
Particularly if 2.2 is about to go out, I'd kindly request that we revert this JIRA so it doesn't get released, since we're actively working on a better fix.
Thanks Andrew Wang for the suggestion. I prefer to revert it in 2.2 branch considering retrieving KMS URI from Namenode is much better than configuration.
The 2.2 branch is still building with Hadoop-2.7.2. My concern is if we revert the patch in 2.2 and when the new Hadoop (containing the breaking change and all the improvements) is released, we may find Hive-2.2 doesn't work with it anyway due to other compatibility issues. Then users of Hive-2.2 will have to live with the annoying logs. That's why I prefer to revert in master and keep it in 2.2 (or 2.x).
Are the Hive+Hadoop version expectations documented somewhere? If Hive users expect to be able to swap out hadoop jars/versions, then we can run into problems. This area of the DFSClient code is going through some changes with plans to backport to maintenance releases (e.g. 2.7.4), which will continue to break the Hive shims.
Hey guys, Rui Li, Ferdinand Soethe,
I think we should try to revert this from Hive if the method is expected to be changed on maintenance Hadoop releases. Sometimes, we'd like to bump our Hadoop supported version in maintenance releases as well, but if we do so, then we might need to add more shims interfaces just for the isHDFSEncryptionEnabled method in a Hive maintenance release.
Rui Li You mentioned is ok to revert it on master, and work with the hdfs team to provide some public API for this. Can we continue with this? Also, the patch is not on branch-2.2 (next stable hive version), and branch-2.3 might be a little unstable due to many changes going on this branch. Can we try to revert it from branch-2 and branch-2.3?
Glad that Sergio Peña also copied the information from mailing list here to the authors. I am OK with both the solutions, i.e., reverting the patch or keeping the patch. Please discuss and make an agreement. Thanks.
Hi Sergio Peña, I'm not aware the patch is not in branch-2.2. I'll revert the changes in master and branch-2.3. Meanwhile, it'd be great that HDFS team can quash the log and make sure the future Hadoop does work with Hive-2.3. Otherwise we'll be reverting for no good.
Thank you so much Rui Li. I noticed the patch wasn't reverted on branch-2 so I just reverted as well. For info about branch-2.2, the community decided to based branch-2.2 from branch-2.1 (not master), that's why it was not included into the branch.
Andrew Wang When do you think we would have this new API on HDFS?
Sergio Peña, HDFS-11687 introduces a HdfsAdmin#getKeyProvider() API for hive. It returns null if the encryption is not enabled. It is close to be committed (by EOD or tomorrow).
Sergio Peña thanks for letting me know. Since we're rolling RCs for Hive-2.3, I don't think 2.3 can use the new API. HDFS should quash the log so after user upgrades Hadoop, the issue goes away right?
The change is trivial. But I'm not sure if there's compatibility concern, because the isHDFSEncryptionEnabled method is only available since Hadoop-2.7.1.
Ferdinand Xu, Xuefu Zhang any ideas?