Chukwa
  1. Chukwa
  2. CHUKWA-132

Handle multiline output in Job History file

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Data Processors
    • Labels:
      None
    • Environment:

      Redhat EL 5.1, Java 6

      Description

      When there are multi line output in the Job History file, the parser fails with exception like this:

      MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" START_TIME="1239190934835" TRACKER_NAME="tracker_kry50024\.inktomisearch\.com:localhost/127\.0\.0\.1:39507" HTTP_PORT="50060" .
      MapAttempt TASK_TYPE="MAP" TASKID="task_200904060626_2141_m_000108" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000108_1" TASK_STATUS="FAILED" FINISH_TIME="1239190949062" HOSTNAME="kry50024\.inktomisearch\.com" ERROR="java\.io\.IOException: MROutput/MRErrThread failed:java\.lang\.ArrayIndexOutOfBoundsException: -1
      at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.hashCode(KeyFieldBasedPartitioner\.java:95)
      at org\.apache\.hadoop\.mapred\.lib\.KeyFieldBasedPartitioner\.getPartition(KeyFieldBasedPartitioner\.java:87)
      at org\.apache\.hadoop\.mapred\.MapTask$MapOutputBuffer\.collect(MapTask\.java:801)
      at org\.apache\.hadoop\.streaming\.PipeMapRed$MROutputThread\.run(PipeMapRed\.java:378)

      at org\.apache\.hadoop\.streaming\.PipeMapper\.map(PipeMapper\.java:87)
      at org\.apache\.hadoop\.mapred\.MapRunner\.run(MapRunner\.java:50)
      at org\.apache\.hadoop\.streaming\.PipeMapRunner\.run(PipeMapRunner\.java:36)
      at org\.apache\.hadoop\.mapred\.MapTask\.runOldMapper(MapTask\.java:356)
      at org\.apache\.hadoop\.mapred\.MapTask\.run(MapTask\.java:305)
      at org\.apache\.hadoop\.mapred\.Child\.main(Child\.java:156)
      " .
      MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" START_TIME="1239190961843" TRACKER_NAME="tracker_kry3083\.inktomisearch\.com:localhost/127\.0\.0\.1:60970" HTTP_PORT="50060" .
      MapAttempt TASK_TYPE="CLEANUP" TASKID="task_200904060626_2141_m_000197" TASK_ATTEMPT_ID="attempt_200904060626_2141_m_000197_0" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963602" HOSTNAME="/74\.6\.135\.128/kry3083\.inktomisearch\.com" STATE_STRING="cleanup" COUNTERS="

      {(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}" .
      Task TASKID="task_200904060626_2141_m_000197" TASK_TYPE="CLEANUP" TASK_STATUS="SUCCESS" FINISH_TIME="1239190963509" COUNTERS="{(org.apache.hadoop.mapred.Task$Counter)(Map-Reduce Framework)[(SPILLED_RECORDS)(Spilled Records)(0)]}

      " .
      Job JOBID="job_200904060626_2141" FINISH_TIME="1239190963510" JOB_STATUS="FAILED" FINISHED_MAPS="0" FINISHED_REDUCES="0" .

      [cchunkException] :java.lang.StringIndexOutOfBoundsException: String index out of range: -1
      at java.lang.String.substring(String.java:1938)
      at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog$JobLogLine.<init>(JobLog.java:114)
      at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.JobLog.parse(JobLog.java:39)
      at org.apache.hadoop.chukwa.extraction.demux.processor.mapper.AbstractProcessor.process(AbstractProcessor.java:90)
      at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:94)
      at org.apache.hadoop.chukwa.extraction.demux.Demux$MapClass.map(Demux.java:60)
      at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
      at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2210)

      [csource] :host.example.com
      [ctags] :cluster="demo"

        Activity

        Hide
        Mac Yang added a comment -

        This is a blocker for 0.1.2.

        Show
        Mac Yang added a comment - This is a blocker for 0.1.2.
        Hide
        Cheng added a comment -

        Job logs could be split into multiple lines. New code monitors input lines. If input recordEntry doesn't end with '"' or '" .', save the log and wait for the next log. Otherwise process the full log.

        Show
        Cheng added a comment - Job logs could be split into multiple lines. New code monitors input lines. If input recordEntry doesn't end with '"' or '" .', save the log and wait for the next log. Otherwise process the full log.
        Hide
        Jerome Boulon added a comment -

        Neither saving the log, nor process full log (Logs can be more than 100MB) are valid option.
        Hadoop contains a parser for JobHistory, what are they doing? Similar code could be applied on Chukwa side at the adaptor level.

        Show
        Jerome Boulon added a comment - Neither saving the log, nor process full log (Logs can be more than 100MB) are valid option. Hadoop contains a parser for JobHistory, what are they doing? Similar code could be applied on Chukwa side at the adaptor level.
        Hide
        Cheng added a comment -

        Current Hadoop job history parser uses the same algorithm to get multiple-lines log. For the detail, please refer JobHistory.parseHistoryFromFS

        Show
        Cheng added a comment - Current Hadoop job history parser uses the same algorithm to get multiple-lines log. For the detail, please refer JobHistory.parseHistoryFromFS
        Hide
        Ari Rabkin added a comment -

        It should be feasible to do this in a custom adapter class; we already have a small coterie of subclasses of FileTailingAdaptor, precisely to allow different policies about where to break chunks.

        Show
        Ari Rabkin added a comment - It should be feasible to do this in a custom adapter class; we already have a small coterie of subclasses of FileTailingAdaptor, precisely to allow different policies about where to break chunks.
        Hide
        Eric Yang added a comment -

        I just committed this, thanks Cheng.

        Show
        Eric Yang added a comment - I just committed this, thanks Cheng.

          People

          • Assignee:
            Cheng
            Reporter:
            Eric Yang
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development