Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
0.12.0
-
None
-
None
Description
Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error:
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://apex156:54310/user/achu/testdata at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) at org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108) at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133) at org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch
commit 23267a0bef064f3351fd879274724bcb02333c4a
one change in question
- $DFS -mkdir testdata + $DFS -mkdir ${WORK_DIR}/testdata
now requires that the -p option be specified to -mkdir. This fix is simple.
Another change:
- $DFS -put ${WORK_DIR}/synthetic_control.data testdata + $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata
appears to break the example b/c in:
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
the file 'testdata' is hard coded into the example as just 'testdata'. ${WORK_DIR}/testdata needs to be passed in as an option.
Reverting the lines listed above fixes the problem. However, the reverting presumably breaks the original problem listed in MAHOUT-1773.
I originally attempted to fix this by simply passing in the option "--input ${WORK_DIR}/testdata" into the command in the script. However, a number of other options are required if one option is specified.
I considered modifying the above Job.java files to take a minimal number of arguments and set the rest to some default, but that would have also required changes to DefaultOptionCreator.java to make required options non-optional, which I didn't want to go down the path of determining what other examples had requires/non-requires requirements.
So I just passed in every required option into cluster-syntheticcontrol.sh to fix this, using whatever defaults were hard coded into the Job.java files above.
I'm sure there's a better way to do this, and I'm happy to supply a patch, but thought I'd start with this.
Github pull request to be sent shortly.
Attachments
Issue Links
- links to