Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Attachments
Attachments
- ASF.LICENSE.NOT.GRANTED--HIVE-2804.D3057.1.patch
- 11 kB
- Phabricator
- HIVE-2804.1.patch.txt
- 11 kB
- Zhenxiao Luo
- ASF.LICENSE.NOT.GRANTED--HIVE-2804.D3057.2.patch
- 11 kB
- Phabricator
- HIVE-2804.2.patch.txt
- 13 kB
- Zhenxiao Luo
- ASF.LICENSE.NOT.GRANTED--HIVE-2804.D3057.3.patch
- 14 kB
- Phabricator
- HIVE-2804.3.patch.txt
- 14 kB
- Zhenxiao Luo
- ASF.LICENSE.NOT.GRANTED--HIVE-2804.D3219.1.patch
- 15 kB
- Phabricator
- HIVE-2804.4.patch.txt
- 15 kB
- Zhenxiao Luo
- ASF.LICENSE.NOT.GRANTED--HIVE-2804.D3219.2.patch
- 18 kB
- Phabricator
- HIVE-2804.5.patch.txt
- 18 kB
- Zhenxiao Luo
- HIVE-2804.5.patch.txt
- 18 kB
- Zhenxiao Luo
- HIVE-2804.6.patch.txt
- 19 kB
- Zhenxiao Luo
- HIVE-2804.7.patch.txt
- 20 kB
- Zhenxiao Luo
- HIVE-2804.8.patch.txt
- 15 kB
- Zhenxiao Luo
- HIVE-2804.9.patch.txt
- 20 kB
- Zhenxiao Luo
Issue Links
- is blocked by
-
MAPREDUCE-4218 Killed MapReduce Job does not generate log in MR2 cluster
- Open
- is related to
-
HIVE-3301 Fix quote printing bug in mapreduce_stack_trace.q testcase failure when running hive on hadoop23
- Closed
Activity
Also relates to the following:
http://search-hadoop.com/m/LR0vP1Y5227&subj=Hive+Hadoop+Log+Retrieval+Problem
It seems like you will have to make a shim or variable that can be used to generate a version dependent URL.
By creating an ThrowNullPointer UDF, and running it in MiniMRCluster, could reproduce a MapReduce job failing.
When running on apache trunk(which is using MR1), get the following error message:
[junit] Begin query: throw_npe.q
[junit] Error during job, obtaining debugging information...
[junit] Examining task ID: task_20120426165633901_0001_m_000002 (and more) from job job_20120426165633901_0001
[junit]
[junit] Task with the most failures(4):
[junit] -----
[junit] Task ID:
[junit] task_20120426165633901_0001_m_000000
[junit]
[junit] URL:
[junit] http://localhost:50030/taskdetails.jsp?jobid=job_20120426165633901_0001&tipid=task_20120426165633901_0001_m_000000
[junit] -----
[junit]
[junit] Exception: Client Execution failed with error code = 9
And in the output file:
PREHOOK: Output: hdfs://rotor:35306/home/cloudera/Code/hive/build/ql/scratchdir/hive_2012-04-26_16-57-11_129_4964809859660696177/-mr-10000
Ended Job = job_20120426165633901_0001 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
So, on apache trunk, TaskLog retrieval is working OK.
Since our CDH4 branch could not run MiniMRCluster yet, will try on an MR2 cluster, and also try to run a local cluster in pseudo-distributed mode.
When running in CDH4 branch(which is using MR2 cluster), in a real non-secure cluster, always getting the following exception:
Exception in thread "Thread-31" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:222)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:82)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://c0409.hal.cloudera.com:8080/tasklog?attemptid=attempt_1335202931724_0600_m_000000_0
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1403)
at java.net.URL.openStream(URL.java:1029)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 2 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
After working a patch, using HostUtil.getTaskLogUrl() to retrieve the log, the same exception still happens.
Found out that, for this intentionally failed testcase, no log generated at all.
The bug could be reproduced in a non-secure MR2 cluster. In non-secure MR1 cluster, everything works OK.
It is not related to secure-cluster at all.
The reason is for a Null Pointer Exception Killed MapReduce Job in MR2 cluster, no log generated for the MapReduce job.
Filed MAPREDUCE-4218 to track the problem.
On Hive Side, when using MR1, Hive is generating the TaskLog URL manually, since no available MapReduce api yet.
when using MR2, Hive could be using the existing MapReduce api to generate TaskLog URLs.
Shims should be created to fix the Hive side problem.
@Edward: Yes. Shims are to created to fix the bug. I will submit a patch soon.
zhenxiao requested code review of "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
Reviewers: JIRA
HIVE-2804 Task log retrieval fails on secure cluster
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D3057
AFFECTED FILES
build-common.xml
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java
ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNPE.java
ql/src/test/queries/clientnegative/cluster_npe_tasklog.q
ql/src/test/results/clientnegative/cluster_npe_tasklog.q.out
shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/6927/
Tip: use the X-Herald-Rules header to filter Herald messages in your client.
njain has commented on the revision "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNPE.java:1 Can this file be added in ql/src/test instead ?
You can add the relevant function in the test.
REVISION DETAIL
https://reviews.facebook.net/D3057
@Zhenxiao: please address the review comments and then resubmit. Thanks!
cwsteinbach has requested changes to the revision "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
INLINE COMMENTS
ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:255 This doesn't belong here. The UDF is for testing purposes only. Users should not see it listed in the output of 'SHOW FUNCTIONS'.
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNPE.java:1 @Namit: Good point.
@Zhenxiao: Please put this in ql/src/test/org/apache/hadoop/hive/ql/udf/generic, and then take a look at ql/src/test/queries/clientpositive/create_genericudf.q for an example of how to register a temporary UDF.
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNPE.java:37 Might be good to change the name to "evaluate_npe" (and update the other comments accordingly) just to make it clear that the NPE is thrown in evaluate() as opposed to initialize().
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNPE.java:49 I'm curious if this if() block is really necessary. Does the Java compiler complain without it?
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFNPE.java:42 Is it possible to write a GenericUDF that takes no input parameters (e.g. like UDFPI)? If so then I think we should do that here since we ignore the input anyway. If that isn't possible, then please change this to take a string as input since that will work better with the src table.
ql/src/test/queries/clientnegative/cluster_npe_tasklog.q:3 Referencing src_thrift may give people the impression that this test is somehow related to Thrift. Let's use the src table instead.
ql/src/test/queries/clientnegative/cluster_npe_tasklog.q:1 Please change the name to "cluster_tasklog_retrieval.q".
shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java:529 Can we call TaskLogServlet.getTaskLogUrl() here instead of manually constructing the URL? If the answer is no then please add a comment explaining why. Thanks.
shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java:33 Same question as above.
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:38 Please use the getHost() and getPort() methods that are provided by java.net.URL.
REVISION DETAIL
https://reviews.facebook.net/D3057
BRANCH
HIVE-2804
zhenxiao updated the revision "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
Reviewers: JIRA, cwsteinbach
HIVE-2804 Task log retrieval fails on secure cluster
REVISION DETAIL
https://reviews.facebook.net/D3057
AFFECTED FILES
build-common.xml
ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEvaluateNPE.java
ql/src/test/queries/clientnegative/cluster_tasklog_retrieval.q
ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out
shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
zhenxiao has commented on the revision "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
@Carl and Namit: Thanks a lot. My updated patch is attached.
1. The without the "if (true)" statement, compiler is complaining return is an unreachable command, and fails the compilation. I keeps this "if(true)".
2. GenericUDFEvaluateNPE is taking a string as parameter.
3. TaskLogServlet is not available as Hadoop library, HostUtil is the corresponding one. While, HostUtil.java is not available until Hadoop0.23, so in Hadoop0.20, still needs to manually construct the URL. I left comments there.
Your comments and suggestions are appreciated.
Thanks,
Zhenxiao
REVISION DETAIL
https://reviews.facebook.net/D3057
cwsteinbach has requested changes to the revision "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
Looks good overall. Please make the changes I requested and I will test and commit. Thanks.
INLINE COMMENTS
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java:46 If this exception is thrown and we ignore it, then the next line is going to throw an NPE since taskTrackerHttpURL will be null. We should make the caller handle this exception instead of squelching it.
ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java:108 Modify this to also throw MalformedUrlException.
ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java:103 Modify this to catch Exception instead of IOException. We should also dump the exception stack trace at this point, but that involves modify LogHelper. We'll save that for another ticket.
REVISION DETAIL
https://reviews.facebook.net/D3057
BRANCH
HIVE-2804
zhenxiao updated the revision "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
Reviewers: JIRA, cwsteinbach
HIVE-2804 Handling MalformedUrlException in the caller
REVISION DETAIL
https://reviews.facebook.net/D3057
AFFECTED FILES
build-common.xml
ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEvaluateNPE.java
ql/src/test/queries/clientnegative/cluster_tasklog_retrieval.q
ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out
shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
zhenxiao requested code review of "HIVE-2804 [jira] Task log retrieval fails on Hadoop 0.23".
Reviewers: JIRA
HIVE-2804 Task log retrieval fails on Hadoop 0.23
TEST PLAN
EMPTY
REVISION DETAIL
https://reviews.facebook.net/D3219
AFFECTED FILES
build-common.xml
ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEvaluateNPE.java
ql/src/test/queries/clientnegative/cluster_tasklog_retrieval.q
ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out
shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
MANAGE HERALD DIFFERENTIAL RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/7287/
To: JIRA, zhenxiao
zhenxiao updated the revision "HIVE-2804 [jira] Task log retrieval fails on Hadoop 0.23".
Reviewers: JIRA
Updated patch for HIVE-2804
REVISION DETAIL
https://reviews.facebook.net/D3219
AFFECTED FILES
build-common.xml
ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEvaluateNPE.java
ql/src/test/queries/clientnegative/cluster_tasklog_retrieval.q
ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out
shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
To: JIRA, zhenxiao
cwsteinbach has accepted the revision "HIVE-2804 [jira] Task log retrieval fails on Hadoop 0.23".
+1. Will commit if tests pass.
REVISION DETAIL
https://reviews.facebook.net/D3219
BRANCH
HIVE-2804
To: JIRA, cwsteinbach, zhenxiao
I also encountered the same problem with hive UDAF queries such as PERCENTILE and ASSERT_TRUE. I have applied given patch and run these 2 queries. This patch fails throwing following excetiption.
Error during job, obtaining debugging information... Examining task ID: task_1340014683209_0037_m_000000 (and more) from job job_1340014683209_0037 Exception in thread "Thread-26" java.lang.RuntimeException: Bad task log url at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:193) at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:227) at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:94) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.MalformedURLException at java.net.URL.<init>(URL.java:601) at java.net.URL.<init>(URL.java:464) at java.net.URL.<init>(URL.java:413) at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:191) ... 3 more
Correct me if I am wrong ,
With my understanding problem is adding URI's into 'taskLogUrls', I feel we need null check while adding URI into 'taskLogUrls' because hadoop23shims returns null if the cluster is MR2. This URI construction fails since null is passed as argument when TaskLogProcessor.getErrors() or TaskLogProcessor.getStackTraces() is called and throws MalformedURLException.
In addition to above,
Currently, Once the job is failed, debug information is displayed along with URL. This 'taskUrl' displayed is specific to MR1 cluster. This 'taskUrl' construction should be based on MR cluster.
@rohithsharma:
Thanks a lot for the comment. Are you using MR1 or MR2 to reproduce the exception? I manually tested HIVE-2804.6.patch.txt on a real cluster for both MR1 and MR2. It was running OK. While, returning null could be a problem. I will attach an updated HIVE-2804.7.patch.txt, trying to fix the problem.
Thanks,
Zhenxiao
@rohithsharma:
new patch attached: HIVE-2804.7.patch.txt
added null check for taskLogUrl construction.
Would you please try this patch, and if the exception still occurs, send me a note of the stack trace, I will do more investigation.
Thanks,
Zhenxiao
@zhenxiao
>>> Are you using MR1 or MR2 to reproduce the exception?
I am using MR2 cluster.
Latest patch solved the problem, Thanks you
I think,
adding "GenericUDFEvaluateNPE.java" only for test framework is not required. Existing below UDAF queries will catch above problem.
>> SELECT percentile(key,5) FROM src;
>> SELECT assert_true(key >= 100) FROM src;
We can add these queries into tests.
In TaskLogProcessor.java, there are 2 methods which construct URL. One is getError() and another is getStackTraces(). TaskLogProcessor.getStackTraces() is called only if 'hive.exec.job.debug.capture.stacktraces' is set to true.I think in your case this was not set false.
@rohithsharma:
Thank you so much for the comment. I will update the patch, and submit for review soon.
Updated patch submitted for review at:
https://reviews.facebook.net/D4551
Integrated in Hive-trunk-h0.21 #1606 (See https://builds.apache.org/job/Hive-trunk-h0.21/1606/)
HIVE-2804. Task log retrieval fails on Hadoop 0.23 (Zhenxiao Luo via cws) (Revision 1373145)
Result = FAILURE
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373145
Files :
- /hive/trunk/build-common.xml
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
- /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEvaluateNPE.java
- /hive/trunk/ql/src/test/queries/clientnegative/cluster_tasklog_retrieval.q
- /hive/trunk/ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out
- /hive/trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
- /hive/trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
- /hive/trunk/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
- /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
zhenxiao has abandoned the revision "HIVE-2804 [jira] Task log retrieval fails on Hadoop 0.23".
REVISION DETAIL
https://reviews.facebook.net/D3567
To: JIRA, zhenxiao
zhenxiao has abandoned the revision "HIVE-2804 [jira] Task log retrieval fails on secure cluster".
REVISION DETAIL
https://reviews.facebook.net/D3057
To: JIRA, cwsteinbach, zhenxiao
Cc: njain
Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
HIVE-2804. Task log retrieval fails on Hadoop 0.23 (Zhenxiao Luo via cws) (Revision 1373145)
Result = ABORTED
cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1373145
Files :
- /hive/trunk/build-common.xml
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HadoopJobExecHelper.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/JobDebugger.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
- /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEvaluateNPE.java
- /hive/trunk/ql/src/test/queries/clientnegative/cluster_tasklog_retrieval.q
- /hive/trunk/ql/src/test/results/clientnegative/cluster_tasklog_retrieval.q.out
- /hive/trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
- /hive/trunk/shims/src/0.20S/java/org/apache/hadoop/hive/shims/Hadoop20SShims.java
- /hive/trunk/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
- /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
On a secure Hadoop cluster TaskLogProcessor.getErrors() fails: