[MAPREDUCE-3619] Change streaming code to use new mapreduce api. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.23.1
Fix Version/s: None
Component/s: contrib/streaming, mrv2
Labels:
None

Description

If we run a streaming job with following python script as mapper or reducer, the job will throws NullPointerException.

#!/usr/bin/python
import sys,os
class MyTask:
  def __init__(self, file=sys.stdin):
    self.file = file
    print >>sys.stderr, "reporter:counter:spam,disp_flag_record,0"
    print >>sys.stderr, "reporter:counter:spam,spam_record,0"
  def process(self):
    while True:
      line = self.file.readline()
      if not line:
        break;
      print line

if __name__ == "__main__":
  task = MyTask()
  task.process()

Here is the NPE related log:
2011-12-22 14:14:06,310 WARN org.apache.hadoop.streaming.PipeMapRed: java.lang.NullPointerException
at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.incrCounter(PipeMapRed.java:502)
at org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:444)

This is because the above script's "print >>sys.stderr" will invoke reporter.incrCounter() during PipeMapper|PipeReducer.configure(). While we can not get reporter in configure() function.
To fix this problem, we should change streaming code to use new-api. Then we can call context.getCounter() in Mapper|Reducer.setup() function.

Attachments

Issue Links

is related to

MAPREDUCE-1122 streaming with custom input format does not support the new API

Open

Activity

People

Assignee:: Unassigned

Reporter:: Liyin Liang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 05/Jan/12 12:24

Updated:: 09/Jan/12 14:43

Resolved:: 09/Jan/12 14:43