Uploaded image for project: 'Mahout'
  1. Mahout
  2. MAHOUT-932

RandomForest quits with ArrayIndexOutOfBoundsException while running sample

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 0.6
    • Fix Version/s: 0.10.0
    • Component/s: Classification
    • Environment:

      Mac OS X, current Mac OS shipped Java version, latest checkout from 17.12.2011

      Dual Core MacBook Pro 2009, 8 Gb, SSD

      Description

      Hello,

      when running the example under https://cwiki.apache.org/MAHOUT/partial-implementation.html with the recommended data sets several issues occur.
      First: ARFF files seem no longer to be supported, I've been using the UCI format as recommended here (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, Mahout quits when creating the description file (wrong number of attributes in the string), using UCI format it works.

      The main error happends during the BuildForest step (I could not test TestForest, due to missing tree).
      Running:
      $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest.

      I tested different split.size values. 1874231, 187423, 18742 give the following error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, SSD).

      It quits after a while (map is almost done) with the following message:
      11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
      11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. And is in the process of commiting
      11/12/17 16:23:24 INFO mapred.LocalJobRunner:
      11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is allowed to commit now
      11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000999_0' to file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
      11/12/17 16:23:27 INFO mapred.LocalJobRunner:
      11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
      11/12/17 16:23:28 INFO mapred.JobClient: map 100% reduce 0%
      11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
      11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
      11/12/17 16:23:28 INFO mapred.JobClient: File Output Format Counters
      11/12/17 16:23:28 INFO mapred.JobClient: Bytes Written=41869032
      11/12/17 16:23:28 INFO mapred.JobClient: FileSystemCounters
      11/12/17 16:23:28 INFO mapred.JobClient: FILE_BYTES_READ=37443033225
      11/12/17 16:23:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=44946910704
      11/12/17 16:23:28 INFO mapred.JobClient: File Input Format Counters
      11/12/17 16:23:28 INFO mapred.JobClient: Bytes Read=20478569
      11/12/17 16:23:28 INFO mapred.JobClient: Map-Reduce Framework
      11/12/17 16:23:28 INFO mapred.JobClient: Map input records=125973
      11/12/17 16:23:28 INFO mapred.JobClient: Spilled Records=0
      11/12/17 16:23:28 INFO mapred.JobClient: Map output records=100000
      11/12/17 16:23:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=215000
      Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
      at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
      at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
      at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
      at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
      at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
      at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
      at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
      at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

      PS: I adjusted the class to .classifier.df. and removed -oop

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                berttenfall Berttenfall M.
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: