Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-476

bug when running org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver on hadoop

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 0.3
    • 0.3
    • None
    • None
    • hadoop 0.20.2
      mahout-0.3
      ubuntu

    Description

      when I follow wiki instruction: https://cwiki.apache.org/MAHOUT/wikipedia-bayes-example.html
      (by the way, the bayes examples document in wiki need update to 0.3 )
      to run step 5:
      Create the countries based Split of wikipedia dataset.

      I use the following command:
      $HADOOP_HOME/bin/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.3.job org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i $MAHOUT_HOME/examples/work/wikipedia/chunks -o $MAHOUT_HOME/examples/work/wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt

      and failed on hadoop.
      see hadoop log, it hint:
      Error: org.apache.lucene.wikipedia.analysis.WikipediaTokenizer.addAttribute(Ljava/lang/Class;)Lorg/apache/lucene/util/Attribute

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            leonlee leon lee
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment