Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-4362

Hadoop Streaming failed with large number of input files

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: 0.18.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Simple job failed with "java.lang.ArrayIndexOutOfBoundsException" when the mapper is /bin/cat and the number of input files is large.

      $ hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input in_data -output op_data -mapper /bin/cat -reducer NONE
      additionalConfSpec_:null
      null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
      packageJobJar: [/tmp/hadoop-unjar49637/] []
      /tmp/streamjob49638.jar tmpDir=/tmp
      08/10/07 07:03:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
      implement Tool for the same.
      08/10/07 07:03:11 INFO mapred.FileInputFormat: Total input paths to process : 16365
      08/10/07 07:03:12 INFO mapred.FileInputFormat: Total input paths to process : 16365
      08/10/07 07:03:15 ERROR streaming.StreamJob: Error Launching job : java.io.IOException:
      java.lang.ArrayIndexOutOfBoundsException

      Streaming Job Failed!

      But when the input number of files are less job does not fail .

      $ hadoop jar $HADOOP_HOME/hadoop-streaming.jar -input inp_data1 -output op_data1 -mapper /bin/cat -reducer NONE
      additionalConfSpec_:null
      null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
      packageJobJar: [/tmp/hadoop-unjar3725/] []
      /tmp/streamjob3726.jar tmpDir=/tmp
      08/10/07 07:06:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should
      implement Tool for the same.
      08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
      08/10/07 07:06:39 INFO mapred.FileInputFormat: Total input paths to process : 16
      08/10/07 07:06:42 INFO streaming.StreamJob: getLocalDirs():
      [/var/mapred/local]
      08/10/07 07:06:42 INFO streaming.StreamJob: Running job: job_200810070645_0006
      08/10/07 07:06:42 INFO streaming.StreamJob: To kill this job, run:
      08/10/07 07:06:42 INFO streaming.StreamJob: hadoop job -Dmapred.job.tracker=login1:51981 -kill job_200810070645_0006
      08/10/07 07:06:42 INFO streaming.StreamJob: Tracking URL: http://login1:52941/jobdetails.jsp?jobid=job_200810070645_0006
      08/10/07 07:06:43 INFO streaming.StreamJob: map 0% reduce 0%
      08/10/07 07:06:46 INFO streaming.StreamJob: map 44% reduce 0%
      08/10/07 07:06:47 INFO streaming.StreamJob: map 75% reduce 0%
      08/10/07 07:06:48 INFO streaming.StreamJob: map 88% reduce 0%
      08/10/07 07:06:49 INFO streaming.StreamJob: map 100% reduce 100%
      08/10/07 07:06:49 INFO streaming.StreamJob: Job complete: job_200810070645_0006
      08/10/07 07:06:49 INFO streaming.StreamJob: Output: op_data1

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              chansler Robert Chansler
              Reporter:
              peeyushb Peeyush Bishnoi

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment