Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.15.3
    • Component/s: documentation
    • Labels:
      None

      Description

      Sam Pullara sends me:

      Phu was going through the WordCount example... lines 52 and 53 should have args[0] and args[1]:
      
      http://lucene.apache.org/hadoop/docs/current/mapred_tutorial.html
      
      The javac and jar command are also wrong, they don't include the directories for the packages, should be:
      
      $ javac -classpath ${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar -d classes WordCount.java 
      $ jar -cvf /usr/joe/wordcount.jar WordCount.class -C classes .
      
      
      1. HADOOP-2574_0_20080110.patch
        23 kB
        Arun C Murthy
      2. HADOOP-2574_1_20080114.patch
        28 kB
        Arun C Murthy
      3. mapred_tutorial.html
        111 kB
        Arun C Murthy
      4. mapred_tutorial.html
        111 kB
        Arun C Murthy

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-Nightly #366 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/366/ )
        Hide
        Phu Hoang added a comment -

        That is perfect. Sorry I didn't catch that.

        Phu

        On Jan 14, 2008, at 8:45 PM, "Arun C Murthy (JIRA)"

        Show
        Phu Hoang added a comment - That is perfect. Sorry I didn't catch that. Phu On Jan 14, 2008, at 8:45 PM, "Arun C Murthy (JIRA)"
        Hide
        Arun C Murthy added a comment -

        I've clarified in the tutorial that WordCount v1 works with local, pseudo-distributed and fully-distributed modes while v2 needs HDFS to be up and running (pseudo-distributed or fully-distributed) - primarily due to the usage of the DistributedCache. Works?

        Show
        Arun C Murthy added a comment - I've clarified in the tutorial that WordCount v1 works with local, pseudo-distributed and fully-distributed modes while v2 needs HDFS to be up and running (pseudo-distributed or fully-distributed) - primarily due to the usage of the DistributedCache. Works?
        Hide
        Phu Hoang added a comment -

        Arun,

        the tutorial addresses all the bugs that I encountered. Great job!!

        I was not able to detect that you resolved the smaller confusion (not
        show stopper) of whether these examples work if you were in stand-
        alone mode or not.

        I was able to run WordCount v1.0 using local input and output
        directories (not HDFS). I was NOT able to run WordCount v2.0 using
        local input and output directories, only HDFS input and output
        directories work there. It would be good to set the reader's
        expectation.

        I could see people trying the examples out without doing HDFS first,
        and may run into issues with WordCount v2.0.

        Phu

        Show
        Phu Hoang added a comment - Arun, the tutorial addresses all the bugs that I encountered. Great job!! I was not able to detect that you resolved the smaller confusion (not show stopper) of whether these examples work if you were in stand- alone mode or not. I was able to run WordCount v1.0 using local input and output directories (not HDFS). I was NOT able to run WordCount v2.0 using local input and output directories, only HDFS input and output directories work there. It would be good to set the reader's expectation. I could see people trying the examples out without doing HDFS first, and may run into issues with WordCount v2.0. Phu
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12373082/HADOOP-2574_1_20080114.patch
        against trunk revision r611760.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests -1. The patch failed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373082/HADOOP-2574_1_20080114.patch against trunk revision r611760. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests -1. The patch failed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1584/console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        I just committed this.

        Show
        Arun C Murthy added a comment - I just committed this.
        Hide
        Arun C Murthy added a comment -
        Show
        Arun C Murthy added a comment - Phu - does this patch ( http://issues.apache.org/jira/secure/attachment/12373083/mapred_tutorial.html ) address your concerns?
        Hide
        Amar Kamat added a comment -

        +1

        Show
        Amar Kamat added a comment - +1
        Hide
        Arun C Murthy added a comment -

        Updated to incorporate Phu's original ask and Amar's feedback... again, I've attached the generate mapred_tutorial.html for folks to review it without having to figure forrest.

        Show
        Arun C Murthy added a comment - Updated to incorporate Phu's original ask and Amar's feedback... again, I've attached the generate mapred_tutorial.html for folks to review it without having to figure forrest.
        Hide
        Amar Kamat added a comment -

        +1. I tried running the wordcount-v2 line by line and now it works fine. Just a small suggestions, in the last example of wordcount v2 (case sensitive = true/false), could you make the distinction evident by having Hello and hello in the input file.

        Show
        Amar Kamat added a comment - +1. I tried running the wordcount-v2 line by line and now it works fine. Just a small suggestions, in the last example of wordcount v2 (case sensitive = true/false), could you make the distinction evident by having Hello and hello in the input file.
        Hide
        Arun C Murthy added a comment -

        Uh, I missed:

        The quickstart tutorial does not make it clear which examples work under which scenarios (Stand alone, Pseudo-Distributed, or Fully-Distributed).

        Show
        Arun C Murthy added a comment - Uh, I missed: The quickstart tutorial does not make it clear which examples work under which scenarios (Stand alone, Pseudo-Distributed, or Fully-Distributed).
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12373013/HADOOP-2574_0_20080110.patch
        against trunk revision r611385.

        @author +1. The patch does not contain any @author tags.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new compiler warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/testReport/
        Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12373013/HADOOP-2574_0_20080110.patch against trunk revision r611385. @author +1. The patch does not contain any @author tags. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new compiler warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/testReport/ Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/artifact/trunk/build/test/checkstyle-errors.html Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1558/console This message is automatically generated.
        Hide
        Arun C Murthy added a comment -

        Here is how the tutorial looks with this patch...

        Show
        Arun C Murthy added a comment - Here is how the tutorial looks with this patch...
        Hide
        Arun C Murthy added a comment -

        Here is patch which addresses most of Phu's concerns...

        Show
        Arun C Murthy added a comment - Here is patch which addresses most of Phu's concerns...
        Hide
        Nigel Daley added a comment -

        Is this a blocker for 0.15.3? I'd say no. Devaraj or Arun, can you look at these issues?

        Show
        Nigel Daley added a comment - Is this a blocker for 0.15.3? I'd say no. Devaraj or Arun, can you look at these issues?
        Hide
        Phu Hoang added a comment -

        On WordCount v1.0 above, there is also another bug:
        line 3 should be import java.io.*; instead of import java.io.Exception;

        On WordCount v2.0, where local cache files are used, there are also bugs:

        1. line 108 and 109 should be:
        conf.setInputPath(new Path(other_args.get(0)));
        conf.setOutputPath(new Path(other_args.get(1)));

        2. If I run the program without using the -skip argument, as in:
        ~/Hadoop/bin/hadoop jar ~phu/Hadoop/Examples/WordCount2/wordcount.jar org.myorg.WordCount -Dwordcount.case.sensitive=false WordCount2/input WordCount2/output, I get the following error message:

        java.lang.NullPointerException
        at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:197)
        at org.apache.hadoop.filecache.DistributedCache.getLocalCacheFiles(DistributedCache.java:470)
        at org.myorg.WordCount$MapClass.configure(WordCount.java:33)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760)

        Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
        at org.myorg.WordCount.run(WordCount.java:110)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.myorg.WordCount.main(WordCount.java:115)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

        Looking at DistributedCache.java, line 470, we see:
        return StringUtils.stringToPath(conf.getStrings("mapred.cache.localFiles"));

        conf.getStrings is returning NULL. So somehow we have to initialize this so that it does not throw an exception when the -skip argument is not used.

        When I do put in the -skip patterns.txt file, the program works.

        Lastly,
        WordCount v1.0 works even if I do not use DFS, and just access local input and output files. WordCount v2.0 does not work if I do not use DFS. The quickstart tutorial does not make it clear which examples work under which scenarios (Stand alone, Pseudo-Distributed, or Fully-Distributed). One could be mistaken to thinking that all examples work under all scenarios.

        Phu

        Show
        Phu Hoang added a comment - On WordCount v1.0 above, there is also another bug: line 3 should be import java.io.*; instead of import java.io.Exception; On WordCount v2.0, where local cache files are used, there are also bugs: 1. line 108 and 109 should be: conf.setInputPath(new Path(other_args.get(0))); conf.setOutputPath(new Path(other_args.get(1))); 2. If I run the program without using the -skip argument, as in: ~/Hadoop/bin/hadoop jar ~phu/Hadoop/Examples/WordCount2/wordcount.jar org.myorg.WordCount -Dwordcount.case.sensitive=false WordCount2/input WordCount2/output, I get the following error message: java.lang.NullPointerException at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:197) at org.apache.hadoop.filecache.DistributedCache.getLocalCacheFiles(DistributedCache.java:470) at org.myorg.WordCount$MapClass.configure(WordCount.java:33) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:188) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1760) Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831) at org.myorg.WordCount.run(WordCount.java:110) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.myorg.WordCount.main(WordCount.java:115) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:155) Looking at DistributedCache.java, line 470, we see: return StringUtils.stringToPath(conf.getStrings("mapred.cache.localFiles")); conf.getStrings is returning NULL. So somehow we have to initialize this so that it does not throw an exception when the -skip argument is not used. When I do put in the -skip patterns.txt file, the program works. Lastly, WordCount v1.0 works even if I do not use DFS, and just access local input and output files. WordCount v2.0 does not work if I do not use DFS. The quickstart tutorial does not make it clear which examples work under which scenarios (Stand alone, Pseudo-Distributed, or Fully-Distributed). One could be mistaken to thinking that all examples work under all scenarios. Phu

          People

          • Assignee:
            Arun C Murthy
            Reporter:
            Doug Cutting
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development