Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When running hive on hadoop0.23, mapreduce_stack_trace.q is failing due to quote printing bug:
quote is printed as: '"', instead of "
Seems not able to state the bug clearly in html:
quote is printed as 'address sign' + 'quot' + semicolon
not the expected 'quote sign'
Attachments
Attachments
Issue Links
- relates to
-
HIVE-2804 Task log retrieval fails on Hadoop 0.23
- Closed
-
HIVE-3240 Fix non-deterministic results in newline.q and timestamp_lazy.q
- Closed
-
HIVE-3242 Fix cascade_dbdrop.q when building hive on hadoop0.23
- Closed
-
HIVE-3249 Upgrade guava to 11.0.2
- Closed
-
HIVE-3257 Fix avro_joins.q testcase failure when building hive on hadoop0.23
- Closed
-
HIVE-3273 Add avro jars into hive execution classpath
- Closed
-
HIVE-3275 Fix autolocal1.q testcase failure when building hive on hadoop0.23 MR2
- Closed
-
HIVE-3303 Fix error code inconsistency bug in mapreduce_stack_trace.q and mapreduce_stack_trace_turnoff.q when running hive on hadoop23
- Closed
The problem is:
In hadoop23, TaskLogServlet.java is using a new utility HtmlQuoting.java to print Task Log.
In TaskLogServlet.java, printTaskLog() function:
result = taskLogReader.read(b);
{ out.write(b, 0, result); } else { HtmlQuoting.quoteHtmlChars(out, b, 0, result); }if (result > 0) {
if (plainText)
} else { break; }
While, in hadoop20, TaskLogServlet.java is using its own utility(there is no such HtmlQuoting.java at all) to print Task Log:
In TaskLogServlet.java, printTaskLog fucntion:
result = taskLogReader.read(b);
if (result > 0) {
if (plainText) { out.write(b, 0, result); }
else
{ quotedWrite(out, b, 0, result); }} else
{ break; }And in Hive, TaskLogProcessor.java is generating stack trace by reading the raw taskAttemptLog.
In ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java, getStackTraces() fuction:
List<String> stackTrace = null;
// Patterns that match the middle/end of stack traces
Pattern stackTracePattern = Pattern.compile("^\tat .*", Pattern.CASE_INSENSITIVE);
Pattern endStackTracePattern =
Pattern.compile("^\t... [0-9]+ more.*", Pattern.CASE_INSENSITIVE);
while ((inputLine = in.readLine()) != null) {
if (stackTracePattern.matcher(inputLine).matches() ||
endStackTracePattern.matcher(inputLine).matches()) {
To have Hive working for both hadoop20 and hadoop23, we should use different mechanisms when hive TaskLogProcessor is parsing TaskAttemptLog.
My plan is creating a shim, which have different implementations for hadoop20 and hadoop23.
In hadoop23, HtmlQuoting.unquoteHtmlChars() is used to parse the TaskAttemptLog.