Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3319

multifilewc from hadoop examples seems to be broken in 0.20.205.0

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.20.205.0
    • 1.0.0
    • examples

    Description

      /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples-0.20.205.0.22.jar multifilewc  examples/text examples-output/multifilewc
      11/10/31 16:50:26 INFO mapred.FileInputFormat: Total input paths to process : 2
      11/10/31 16:50:26 INFO mapred.JobClient: Running job: job_201110311350_0220
      11/10/31 16:50:27 INFO mapred.JobClient:  map 0% reduce 0%
      11/10/31 16:50:42 INFO mapred.JobClient: Task Id : attempt_201110311350_0220_m_000000_0, Status : FAILED
      java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hadoop.io.LongWritable
      	at org.apache.hadoop.mapred.lib.LongSumReducer.reduce(LongSumReducer.java:44)
      	at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1431)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
      	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
      	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
      	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
      	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
      	at java.security.AccessController.doPrivileged(Native Method)
      	at javax.security.auth.Subject.doAs(Subject.java:396)
      	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
      	at org.apache.hadoop.mapred.Child.main(Child.java:249)
      

      Attachments

        1. MAPREDUCE-3319.patch
          2 kB
          Subroto Sanyal

        Activity

          I really think we should fix this before releasing any version that has a potential of becoming Hadoop 1.0. I'll try to see if I can cook up a patch to fix it. If not – I'd say it is better to disable this example, rather than shipping a broken one.

          Agreed?

          rvs Roman Shaposhnik added a comment - I really think we should fix this before releasing any version that has a potential of becoming Hadoop 1.0. I'll try to see if I can cook up a patch to fix it. If not – I'd say it is better to disable this example, rather than shipping a broken one. Agreed?
          mattf Matthew Foley added a comment -

          Moved requested target versions from Fix Version to Target Version field.
          Absent a viable patch in time for 1.0.0, this will have to go in 1.1.0.

          mattf Matthew Foley added a comment - Moved requested target versions from Fix Version to Target Version field. Absent a viable patch in time for 1.0.0, this will have to go in 1.1.0.

          Matt, what's the cut-off date for providing a patch? I'd really like to see this fixed in 1.0.

          rvs Roman Shaposhnik added a comment - Matt, what's the cut-off date for providing a patch? I'd really like to see this fixed in 1.0.

          Ok, so this appears to be fixed in branch-1.0 rev 1207581

          Is this the branch from which 1.0.0 will be cut from?

          rvs Roman Shaposhnik added a comment - Ok, so this appears to be fixed in branch-1.0 rev 1207581 Is this the branch from which 1.0.0 will be cut from?
          subrotosanyal Subroto Sanyal added a comment -

          The patch makes the code comapatible with LongWritable as final Output value class

          subrotosanyal Subroto Sanyal added a comment - The patch makes the code comapatible with LongWritable as final Output value class
          hadoopqa Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12506241/MAPREDUCE-3319.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1399//console

          This message is automatically generated.

          hadoopqa Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12506241/MAPREDUCE-3319.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1399//console This message is automatically generated.
          subrotosanyal Subroto Sanyal added a comment -

          The patch is prepared on branch:
          http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1

          The patch resolves an issue in hadoop-example; so tests not included.

          subrotosanyal Subroto Sanyal added a comment - The patch is prepared on branch: http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1 The patch resolves an issue in hadoop-example; so tests not included.
          mattf Matthew Foley added a comment -

          Subroto, thanks for providing a patch. Unfortunately you also set the "Fix Version" to 1.0.0, which implies it is already fixed in 1.0.0, which it apparently isn't, according to Roman's email of today. (Although Roman's comment of 28/Nov/11 21:49 also seems to imply it was fixed in 1.0.0. Perhaps that was user error.)

          Please do not set "Fix Version" until after the patch is committed to a given version. The intent or desire to have a fix in a given version should be expressed in "Target Version", not "Fix Version".

          +1 for code review, lgtm. Committed to branch-1 for future release 1.1.0.
          Thanks Subroto!

          mattf Matthew Foley added a comment - Subroto, thanks for providing a patch. Unfortunately you also set the "Fix Version" to 1.0.0, which implies it is already fixed in 1.0.0, which it apparently isn't, according to Roman's email of today. (Although Roman's comment of 28/Nov/11 21:49 also seems to imply it was fixed in 1.0.0. Perhaps that was user error.) Please do not set "Fix Version" until after the patch is committed to a given version. The intent or desire to have a fix in a given version should be expressed in "Target Version", not "Fix Version". +1 for code review, lgtm. Committed to branch-1 for future release 1.1.0. Thanks Subroto!
          mattf Matthew Foley added a comment -

          It appears this change is also applicable to trunk, file
          hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/MultiFileWordCount.java

          Subroto, please port your patch to trunk and post another patch.
          Then I can resolve this jira. Thank you.

          mattf Matthew Foley added a comment - It appears this change is also applicable to trunk, file hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/MultiFileWordCount.java Subroto, please port your patch to trunk and post another patch. Then I can resolve this jira. Thank you.
          subrotosanyal Subroto Sanyal added a comment -

          Hi Matt,

          I ran the multifilewc in trunk (mrv2) and it runs fine.
          Output of the run:

          2011-12-12 10:35:16,569 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1227))
          -  map 100% reduce 100%
          2011-12-12 10:35:16,574 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1238))
          - Job job_1322715021135_0002 completed successfully
          2011-12-12 10:35:16,743 INFO  mapreduce.Job (Job.java:monitorAndPrintJob(1245))
          - Counters: 43
                  File System Counters
                          FILE: BYTES_READ=97054
                          FILE: BYTES_WRITTEN=190175
                          FILE: READ_OPS=0
                          FILE: LARGE_READ_OPS=0
                          FILE: WRITE_OPS=0
                          HDFS: BYTES_READ=1165096553
                          HDFS: BYTES_WRITTEN=4394
                          HDFS: READ_OPS=32
                          HDFS: LARGE_READ_OPS=0
                          HDFS: WRITE_OPS=4
                  org.apache.hadoop.mapreduce.JobCounter
                          TOTAL_LAUNCHED_MAPS=1
                          TOTAL_LAUNCHED_REDUCES=1
                          OTHER_LOCAL_MAPS=1
                          SLOTS_MILLIS_MAPS=67860
                          SLOTS_MILLIS_REDUCES=65602
                  org.apache.hadoop.mapreduce.TaskCounter
                          MAP_INPUT_RECORDS=265824
                          MAP_OUTPUT_RECORDS=531648
                          MAP_OUTPUT_BYTES=1166967360
                          MAP_OUTPUT_MATERIALIZED_BYTES=4402
                          SPLIT_RAW_BYTES=989
                          COMBINE_INPUT_RECORDS=531678
                          COMBINE_OUTPUT_RECORDS=32
                          REDUCE_INPUT_GROUPS=2
                          REDUCE_SHUFFLE_BYTES=4402
                          REDUCE_INPUT_RECORDS=2
                          REDUCE_OUTPUT_RECORDS=2
                          SPILLED_RECORDS=46
                          SHUFFLED_MAPS=1
                          FAILED_SHUFFLE=0
                          MERGED_MAP_OUTPUTS=1
                          GC_TIME_MILLIS=2566
                          CPU_MILLISECONDS=69150
                          PHYSICAL_MEMORY_BYTES=200130560
                          VIRTUAL_MEMORY_BYTES=756015104
                          COMMITTED_HEAP_BYTES=137433088
                  Shuffle Errors
                          BAD_ID=0
                          CONNECTION=0
                          IO_ERROR=0
                          WRONG_LENGTH=0
                          WRONG_MAP=0
                          WRONG_REDUCE=0
                  org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
                          BYTES_READ=0
                  org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
                          BYTES_WRITTEN=4394
          

          I think the problem is resolved in trunk already.

          subrotosanyal Subroto Sanyal added a comment - Hi Matt, I ran the multifilewc in trunk (mrv2) and it runs fine. Output of the run: 2011-12-12 10:35:16,569 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1227)) - map 100% reduce 100% 2011-12-12 10:35:16,574 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1238)) - Job job_1322715021135_0002 completed successfully 2011-12-12 10:35:16,743 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1245)) - Counters: 43 File System Counters FILE: BYTES_READ=97054 FILE: BYTES_WRITTEN=190175 FILE: READ_OPS=0 FILE: LARGE_READ_OPS=0 FILE: WRITE_OPS=0 HDFS: BYTES_READ=1165096553 HDFS: BYTES_WRITTEN=4394 HDFS: READ_OPS=32 HDFS: LARGE_READ_OPS=0 HDFS: WRITE_OPS=4 org.apache.hadoop.mapreduce.JobCounter TOTAL_LAUNCHED_MAPS=1 TOTAL_LAUNCHED_REDUCES=1 OTHER_LOCAL_MAPS=1 SLOTS_MILLIS_MAPS=67860 SLOTS_MILLIS_REDUCES=65602 org.apache.hadoop.mapreduce.TaskCounter MAP_INPUT_RECORDS=265824 MAP_OUTPUT_RECORDS=531648 MAP_OUTPUT_BYTES=1166967360 MAP_OUTPUT_MATERIALIZED_BYTES=4402 SPLIT_RAW_BYTES=989 COMBINE_INPUT_RECORDS=531678 COMBINE_OUTPUT_RECORDS=32 REDUCE_INPUT_GROUPS=2 REDUCE_SHUFFLE_BYTES=4402 REDUCE_INPUT_RECORDS=2 REDUCE_OUTPUT_RECORDS=2 SPILLED_RECORDS=46 SHUFFLED_MAPS=1 FAILED_SHUFFLE=0 MERGED_MAP_OUTPUTS=1 GC_TIME_MILLIS=2566 CPU_MILLISECONDS=69150 PHYSICAL_MEMORY_BYTES=200130560 VIRTUAL_MEMORY_BYTES=756015104 COMMITTED_HEAP_BYTES=137433088 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter BYTES_READ=0 org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter BYTES_WRITTEN=4394 I think the problem is resolved in trunk already.
          mattf Matthew Foley added a comment -

          Okay, Subroto, thanks.
          Release 1.0.0 RC had to be re-spun, so merged this fix as promised.

          mattf Matthew Foley added a comment - Okay, Subroto, thanks. Release 1.0.0 RC had to be re-spun, so merged this fix as promised.
          subrotosanyal Subroto Sanyal added a comment -

          Thanks Matt...

          subrotosanyal Subroto Sanyal added a comment - Thanks Matt...
          mattf Matthew Foley added a comment -

          Closed upon release of version 1.0.0.

          mattf Matthew Foley added a comment - Closed upon release of version 1.0.0.

          People

            subrotosanyal Subroto Sanyal
            rvs Roman Shaposhnik
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: