Hive
  1. Hive
  2. HIVE-3301

Fix quote printing bug in mapreduce_stack_trace.q testcase failure when running hive on hadoop23

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: None
    • Labels:
      None

      Description

      When running hive on hadoop0.23, mapreduce_stack_trace.q is failing due to quote printing bug:

      quote is printed as: '"', instead of "

      Seems not able to state the bug clearly in html:

      quote is printed as 'address sign' + 'quot' + semicolon
      not the expected 'quote sign'

      1. HIVE-3301.1.patch.txt
        4 kB
        Zhenxiao Luo
      2. HIVE-3301.2.patch.txt
        4 kB
        Zhenxiao Luo
      3. HIVE-3301.3.patch.txt
        4 kB
        Zhenxiao Luo

        Issue Links

          Activity

          Hide
          Ashutosh Chauhan added a comment -

          This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.

          Show
          Ashutosh Chauhan added a comment - This issue is fixed and released as part of 0.10.0 release. If you find an issue which seems to be related to this one, please create a new jira and link this one with new jira.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-3301 : Fix quote printing bug in mapreduce_stack_trace.q testcase failure when running hive on hadoop23 (Zhenxiao Luo via Ashutosh Chauhan) (Revision 1366233)

          Result = ABORTED
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1366233
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
          • /hive/trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
          • /hive/trunk/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
          • /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3301 : Fix quote printing bug in mapreduce_stack_trace.q testcase failure when running hive on hadoop23 (Zhenxiao Luo via Ashutosh Chauhan) (Revision 1366233) Result = ABORTED hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1366233 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java /hive/trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java /hive/trunk/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1570 (See https://builds.apache.org/job/Hive-trunk-h0.21/1570/)
          HIVE-3301 : Fix quote printing bug in mapreduce_stack_trace.q testcase failure when running hive on hadoop23 (Zhenxiao Luo via Ashutosh Chauhan) (Revision 1366233)

          Result = SUCCESS
          hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1366233
          Files :

          • /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java
          • /hive/trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java
          • /hive/trunk/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
          • /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1570 (See https://builds.apache.org/job/Hive-trunk-h0.21/1570/ ) HIVE-3301 : Fix quote printing bug in mapreduce_stack_trace.q testcase failure when running hive on hadoop23 (Zhenxiao Luo via Ashutosh Chauhan) (Revision 1366233) Result = SUCCESS hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1366233 Files : /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java /hive/trunk/shims/src/0.20/java/org/apache/hadoop/hive/shims/Hadoop20Shims.java /hive/trunk/shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java /hive/trunk/shims/src/common/java/org/apache/hadoop/hive/shims/HadoopShims.java
          Hide
          Ashutosh Chauhan added a comment -

          Committed to trunk. Thanks, Zhenxiao!

          Show
          Ashutosh Chauhan added a comment - Committed to trunk. Thanks, Zhenxiao!
          Hide
          Ashutosh Chauhan added a comment -

          +1 Running tests.

          Show
          Ashutosh Chauhan added a comment - +1 Running tests.
          Hide
          Zhenxiao Luo added a comment -

          updated patch without prefix
          could apply cleanly

          Show
          Zhenxiao Luo added a comment - updated patch without prefix could apply cleanly
          Hide
          Zhenxiao Luo added a comment -

          patch has prefix in it

          Show
          Zhenxiao Luo added a comment - patch has prefix in it
          Hide
          Zhenxiao Luo added a comment -

          @ashutosh: Thanks a lot for the comments.

          I made updates and resubmitted review request at:
          https://reviews.facebook.net/D4353

          Show
          Zhenxiao Luo added a comment - @ashutosh: Thanks a lot for the comments. I made updates and resubmitted review request at: https://reviews.facebook.net/D4353
          Hide
          Zhenxiao Luo added a comment -

          @Edward: related HIVE tickets are linked. I will add more whenever any new bugs filed. Do we need a separate upper level JIRA to trace all the hadoop23 integration bugs?

          Show
          Zhenxiao Luo added a comment - @Edward: related HIVE tickets are linked. I will add more whenever any new bugs filed. Do we need a separate upper level JIRA to trace all the hadoop23 integration bugs?
          Hide
          Zhenxiao Luo added a comment -

          Link with hadoop23 integration related bugs

          Show
          Zhenxiao Luo added a comment - Link with hadoop23 integration related bugs
          Hide
          Ashutosh Chauhan added a comment -

          Zhenxiao I left comments on Phabricator.

          Show
          Ashutosh Chauhan added a comment - Zhenxiao I left comments on Phabricator.
          Hide
          Edward Capriolo added a comment -

          I am not saying we need to shim layer all the fixes, but having a reasonably exhaustive list of the problems linked together in jira would make me more confident that we are taking the right plan of action.

          Show
          Edward Capriolo added a comment - I am not saying we need to shim layer all the fixes, but having a reasonably exhaustive list of the problems linked together in jira would make me more confident that we are taking the right plan of action.
          Hide
          Zhenxiao Luo added a comment -

          @Edward: oh yes. As I know, HIVE-3301, HIVE-3275, HIVE-3273, HIVE-3242, HIVE-3240, HIVE-3257, HIVE-3249 and HIVE-2804 are all hadoop 23 bugs. I am fixing these one by one. Thanks for your advice. I will try to put them into a larger shim layer.

          I just found an Error Code retrieval inconsistency between hadoop20 and hadoop23. Will file another one soon.

          Thanks,
          Zhenxiao

          Show
          Zhenxiao Luo added a comment - @Edward: oh yes. As I know, HIVE-3301 , HIVE-3275 , HIVE-3273 , HIVE-3242 , HIVE-3240 , HIVE-3257 , HIVE-3249 and HIVE-2804 are all hadoop 23 bugs. I am fixing these one by one. Thanks for your advice. I will try to put them into a larger shim layer. I just found an Error Code retrieval inconsistency between hadoop20 and hadoop23. Will file another one soon. Thanks, Zhenxiao
          Hide
          Zhenxiao Luo added a comment -

          review request submitted at:
          https://reviews.facebook.net/D4353

          Show
          Zhenxiao Luo added a comment - review request submitted at: https://reviews.facebook.net/D4353
          Hide
          Edward Capriolo added a comment -

          You know these hadoop 23 jiras are like death of a thousand paper cuts, if I had known we were going to face so many issues i would have proposed making a larger shim layer. Can we come up with a definitive list of all the 23 problems?

          Show
          Edward Capriolo added a comment - You know these hadoop 23 jiras are like death of a thousand paper cuts, if I had known we were going to face so many issues i would have proposed making a larger shim layer. Can we come up with a definitive list of all the 23 problems?
          Hide
          Zhenxiao Luo added a comment -

          The problem is:

          In hadoop23, TaskLogServlet.java is using a new utility HtmlQuoting.java to print Task Log.

          In TaskLogServlet.java, printTaskLog() function:

          result = taskLogReader.read(b);
          if (result > 0) {
          if (plainText)

          { out.write(b, 0, result); } else { HtmlQuoting.quoteHtmlChars(out, b, 0, result); }
          } else { break; }


          While, in hadoop20, TaskLogServlet.java is using its own utility(there is no such HtmlQuoting.java at all) to print Task Log:

          In TaskLogServlet.java, printTaskLog fucntion:

          result = taskLogReader.read(b);
          if (result > 0) {
          if (plainText) { out.write(b, 0, result); }

          else

          { quotedWrite(out, b, 0, result); }

          } else

          { break; }

          And in Hive, TaskLogProcessor.java is generating stack trace by reading the raw taskAttemptLog.

          In ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java, getStackTraces() fuction:

          List<String> stackTrace = null;

          // Patterns that match the middle/end of stack traces
          Pattern stackTracePattern = Pattern.compile("^\tat .*", Pattern.CASE_INSENSITIVE);
          Pattern endStackTracePattern =
          Pattern.compile("^\t... [0-9]+ more.*", Pattern.CASE_INSENSITIVE);

          while ((inputLine = in.readLine()) != null) {

          if (stackTracePattern.matcher(inputLine).matches() ||
          endStackTracePattern.matcher(inputLine).matches()) {

          To have Hive working for both hadoop20 and hadoop23, we should use different mechanisms when hive TaskLogProcessor is parsing TaskAttemptLog.

          My plan is creating a shim, which have different implementations for hadoop20 and hadoop23.

          In hadoop23, HtmlQuoting.unquoteHtmlChars() is used to parse the TaskAttemptLog.

          Show
          Zhenxiao Luo added a comment - The problem is: In hadoop23, TaskLogServlet.java is using a new utility HtmlQuoting.java to print Task Log. In TaskLogServlet.java, printTaskLog() function: result = taskLogReader.read(b); if (result > 0) { if (plainText) { out.write(b, 0, result); } else { HtmlQuoting.quoteHtmlChars(out, b, 0, result); } } else { break; } While, in hadoop20, TaskLogServlet.java is using its own utility(there is no such HtmlQuoting.java at all) to print Task Log: In TaskLogServlet.java, printTaskLog fucntion: result = taskLogReader.read(b); if (result > 0) { if (plainText) { out.write(b, 0, result); } else { quotedWrite(out, b, 0, result); } } else { break; } And in Hive, TaskLogProcessor.java is generating stack trace by reading the raw taskAttemptLog. In ql/src/java/org/apache/hadoop/hive/ql/exec/errors/TaskLogProcessor.java, getStackTraces() fuction: List<String> stackTrace = null; // Patterns that match the middle/end of stack traces Pattern stackTracePattern = Pattern.compile("^\tat .*", Pattern.CASE_INSENSITIVE); Pattern endStackTracePattern = Pattern.compile("^\t... [0-9] + more.*", Pattern.CASE_INSENSITIVE); while ((inputLine = in.readLine()) != null) { if (stackTracePattern.matcher(inputLine).matches() || endStackTracePattern.matcher(inputLine).matches()) { To have Hive working for both hadoop20 and hadoop23, we should use different mechanisms when hive TaskLogProcessor is parsing TaskAttemptLog. My plan is creating a shim, which have different implementations for hadoop20 and hadoop23. In hadoop23, HtmlQuoting.unquoteHtmlChars() is used to parse the TaskAttemptLog.

            People

            • Assignee:
              Zhenxiao Luo
              Reporter:
              Zhenxiao Luo
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development