Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 0.4
    • 0.4
    • None

    Description

      Can add one "JobName" parameter to org.apache.mahout.cf.taste.hadoop.item.RecommenderJob?
      if there's a lot of RecommenderJob,it's hard to distinguish those jobs.

      also RecommenderJob has four sub jobs (or phase ) ,can add sub-job name to those phase ?

      Because RecommenderJob has not setNumReduceTasks ,it seems that the performance is not good in reduce phase.

      Attachments

        Activity

          akm Andrew Musselman made changes -
          Component/s Collaborative Filtering [ 12312503 ]
          srowen Sean R. Owen made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          srowen Sean R. Owen made changes -
          Fix Version/s 0.4 [ 12314396 ]
          Assignee Sean Owen [ srowen ]
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Resolved [ 5 ]
          srowen Sean R. Owen added a comment -

          OK, I think all the stuff in this issue has been resolved.

          srowen Sean R. Owen added a comment - OK, I think all the stuff in this issue has been resolved.
          srowen Sean R. Owen added a comment -

          Good one Hui, I'll make note of that in examples.

          srowen Sean R. Owen added a comment - Good one Hui, I'll make note of that in examples.
          huiwenhan Han Hui Wen added a comment -

          it works now,

          I run following command:

          hadoop org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=RECOMMENDATION_tap_tag -Dmapred.reduce.tasks=2 -Dmapred.input.dir=/steer/item/in -Dmapred.output.dir=/steer/item/out -Dmapred.output.compress=false --tempDir /steer/item/temp --numRecommendations 10

          it need place all -D options before the main arguments.

          this way will cause parsing error

          hadoop org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=RECOMMENDATION_tap_tag -Dmapred.reduce.tasks=2 -Dmapred.input.dir=/steer/item/in --tempDir /steer/item/temp -Dmapred.output.dir=/steer/item/out --numRecommendations 10 -Dmapred.output.compress=false

          huiwenhan Han Hui Wen added a comment - it works now, I run following command: hadoop org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=RECOMMENDATION_tap_tag -Dmapred.reduce.tasks=2 -Dmapred.input.dir=/steer/item/in -Dmapred.output.dir=/steer/item/out -Dmapred.output.compress=false --tempDir /steer/item/temp --numRecommendations 10 it need place all -D options before the main arguments. this way will cause parsing error hadoop org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=RECOMMENDATION_tap_tag -Dmapred.reduce.tasks=2 -Dmapred.input.dir=/steer/item/in --tempDir /steer/item/temp -Dmapred.output.dir=/steer/item/out --numRecommendations 10 -Dmapred.output.compress=false
          srowen Sean R. Owen added a comment -

          I might well have broken your classes in the recent change. Or maybe the arg parsing was borked in some way from the start. Take a look and/or point me at issues you see next time you dig in.

          srowen Sean R. Owen added a comment - I might well have broken your classes in the recent change. Or maybe the arg parsing was borked in some way from the start. Take a look and/or point me at issues you see next time you dig in.
          jake.mannix Jake Mannix added a comment -

          I suppose I hadn't wanted to be presumptuous and do that without some support, but sounds like this has some consensus.

          Yep, I've been subclassing AbstractJob all over in the decomposer / DistributedRowMatrix stuff already.

          I've had some odd issues with pseudo-distributed vs. really distributed usage and command line options parsing with optional options with it though. I'll try to post the exact problem, but I think I'm not properly using the mixture of commons.cli2 together with the GenericOptionsParser when I subclass AbstractJob though...

          jake.mannix Jake Mannix added a comment - I suppose I hadn't wanted to be presumptuous and do that without some support, but sounds like this has some consensus. Yep, I've been subclassing AbstractJob all over in the decomposer / DistributedRowMatrix stuff already. I've had some odd issues with pseudo-distributed vs. really distributed usage and command line options parsing with optional options with it though. I'll try to post the exact problem, but I think I'm not properly using the mixture of commons.cli2 together with the GenericOptionsParser when I subclass AbstractJob though...
          srowen Sean R. Owen added a comment -

          Yes it should move to common. I suppose I hadn't wanted to be presumptuous and do that without some support, but sounds like this has some consensus.

          I also committed a naming change. If a custom name is set (mapred.job.name), it will add to that "-MapperClass-ReducerClass". The custom name uniquely identifies the job, really, and then the suffix identifies what part of that job is happening.

          srowen Sean R. Owen added a comment - Yes it should move to common. I suppose I hadn't wanted to be presumptuous and do that without some support, but sounds like this has some consensus. I also committed a naming change. If a custom name is set (mapred.job.name), it will add to that "-MapperClass-ReducerClass". The custom name uniquely identifies the job, really, and then the suffix identifies what part of that job is happening.
          drew.farris Drew Farris added a comment -

          (Incidentally, now might be a good time to plug 'AbstractJob'. I think Robin was also keen to rally around refactoring Hadoop jobs to use this class, so we can capture all this good knowledge and practice in one place, and approach job handling consistently. It will no doubt need modification to really serve all needs.)

          I was thinking about this last night when looking at the code – taking the CollocDriver for example and adapting it to use AbstractJob. I was wondering should I depend on it as-is, or should we consider moving it to org.apache.mahout.common?

          OK, try my latest commit. --input and --output are now gone in favor of the standard -Dmapred.input.dir and -Dmapred.output.dir

          Ahh, that's a great idea Sean. The less command-line parsing code anyone has to write the better.

          What do you think can be done about job naming for sub-jobs?

          drew.farris Drew Farris added a comment - (Incidentally, now might be a good time to plug 'AbstractJob'. I think Robin was also keen to rally around refactoring Hadoop jobs to use this class, so we can capture all this good knowledge and practice in one place, and approach job handling consistently. It will no doubt need modification to really serve all needs.) I was thinking about this last night when looking at the code – taking the CollocDriver for example and adapting it to use AbstractJob. I was wondering should I depend on it as-is, or should we consider moving it to org.apache.mahout.common? OK, try my latest commit. --input and --output are now gone in favor of the standard -Dmapred.input.dir and -Dmapred.output.dir Ahh, that's a great idea Sean. The less command-line parsing code anyone has to write the better. What do you think can be done about job naming for sub-jobs?
          srowen Sean R. Owen added a comment -

          OK, try my latest commit. --input and --output are now gone in favor of the standard -Dmapred.input.dir and -Dmapred.output.dir. I made changes mentioned in other issues too.

          It's a little aggressive to commit this but it would solve several issues, is a step forward, and Hui is in a position to really test this.

          srowen Sean R. Owen added a comment - OK, try my latest commit. --input and --output are now gone in favor of the standard -Dmapred.input.dir and -Dmapred.output.dir. I made changes mentioned in other issues too. It's a little aggressive to commit this but it would solve several issues, is a step forward, and Hui is in a position to really test this.
          srowen Sean R. Owen added a comment -

          Thanks Drew, I see the parsing now that you prompted me to look a second time. It sounds like the -D options work. So, we don't need to do things like parse and set the number of reducers with our own argument.

          I like your version of setJarByClass() and will commit that. That also gets rid of the --jarFile option I had put it in earlier.

          So, perhaps I should also get rid of custom --input and --output arguments?

          (Incidentally, now might be a good time to plug 'AbstractJob'. I think Robin was also keen to rally around refactoring Hadoop jobs to use this class, so we can capture all this good knowledge and practice in one place, and approach job handling consistently. It will no doubt need modification to really serve all needs.)

          Hui I also like your proposal to take a name suffix argument to prepareJob(). I will work on that and commit soon.

          srowen Sean R. Owen added a comment - Thanks Drew, I see the parsing now that you prompted me to look a second time. It sounds like the -D options work. So, we don't need to do things like parse and set the number of reducers with our own argument. I like your version of setJarByClass() and will commit that. That also gets rid of the --jarFile option I had put it in earlier. So, perhaps I should also get rid of custom --input and --output arguments? (Incidentally, now might be a good time to plug 'AbstractJob'. I think Robin was also keen to rally around refactoring Hadoop jobs to use this class, so we can capture all this good knowledge and practice in one place, and approach job handling consistently. It will no doubt need modification to really serve all needs.) Hui I also like your proposal to take a name suffix argument to prepareJob(). I will work on that and commit soon.
          huiwenhan Han Hui Wen added a comment -

          parameter -Dmapred.job.name=HADOOP_REC_tap_tag -Dmapred.reduce.tasks=200 can work now.

          the problem is :
          AbstractJob has four sub-job ,they displayed the same name

          ---take the current value of mapred.job.name and append the sub job 's mapper and reducer class name as the sub job's name

          huiwenhan Han Hui Wen added a comment - parameter -Dmapred.job.name=HADOOP_REC_tap_tag -Dmapred.reduce.tasks=200 can work now. the problem is : AbstractJob has four sub-job ,they displayed the same name ---take the current value of mapred.job.name and append the sub job 's mapper and reducer class name as the sub job's name
          drew.farris Drew Farris added a comment -

          Hmm, I don't understand that. AbstractJob.class and ItemIDIndexMapper.class are in the same .jar file. There is only one .jar, with both, so both ought to end up selecting the same .jar file. I'd rather specify the Mapper class instead just because maybe someday someone subclasses AbstractJob, puts the implementation in a different jar, and then this line won't work.

          How about jobConf.setJarByClass(getClass()) in AbstractJob's prepareJobConf? This will always set the jar based on the AbstractJob implementation being executed.

          I was looking to see how Hadoop accepts arguments like "-Dmapred...." on the command line. I can't find it parsing these anywhere. So I don't know this exists.

          ToolRunner.run(..) runs the args through GenericOptionsParser, which adds the results to the conf, which then gets set back on the object being run that implements the Configured interface. These are pulled in by AbstractJob when it creates the jobConf using the getConf() argument.

          AbstractJob has four sub-job ,they displayed the same name, It hard to know which phase does the job run in .

          In this vein, it would be handy if AbstractJob's prepareJobConf method could take a string argument for the subjob name and allow the name to be specified by the class calling it – requiring that the name only be specified on the command-line forces all jobs run under the umbrella of the command to have the same name. Maybe take any current value of mapred.job.name (if specified) and append the string to it?

          drew.farris Drew Farris added a comment - Hmm, I don't understand that. AbstractJob.class and ItemIDIndexMapper.class are in the same .jar file. There is only one .jar, with both, so both ought to end up selecting the same .jar file. I'd rather specify the Mapper class instead just because maybe someday someone subclasses AbstractJob, puts the implementation in a different jar, and then this line won't work. How about jobConf.setJarByClass(getClass()) in AbstractJob's prepareJobConf? This will always set the jar based on the AbstractJob implementation being executed. I was looking to see how Hadoop accepts arguments like "-Dmapred...." on the command line. I can't find it parsing these anywhere. So I don't know this exists. ToolRunner.run(..) runs the args through GenericOptionsParser, which adds the results to the conf, which then gets set back on the object being run that implements the Configured interface. These are pulled in by AbstractJob when it creates the jobConf using the getConf() argument. AbstractJob has four sub-job ,they displayed the same name, It hard to know which phase does the job run in . In this vein, it would be handy if AbstractJob's prepareJobConf method could take a string argument for the subjob name and allow the name to be specified by the class calling it – requiring that the name only be specified on the command-line forces all jobs run under the umbrella of the command to have the same name. Maybe take any current value of mapred.job.name (if specified) and append the string to it?
          drew.farris Drew Farris added a comment -

          Not sure if this is helpful Sean, GenericOptionsParser should be picking up the -Dproperty=value arugments – iirc this handled somewhere in the ToolRunner. The run method's args would have any args left over after the GenericOptionsParser had done its thing. As long as the JobConf is created using the right configuration object, the job should be picking them up. It looks like AbstractJob should be doing this.

          GenericOptionsParser handles a bunch of other handy arguments, the javadoc covers it. It should work without having to do a bunch of argument parsing. If you'd like I can take a closer look this evening.

          drew.farris Drew Farris added a comment - Not sure if this is helpful Sean, GenericOptionsParser should be picking up the -Dproperty=value arugments – iirc this handled somewhere in the ToolRunner. The run method's args would have any args left over after the GenericOptionsParser had done its thing. As long as the JobConf is created using the right configuration object, the job should be picking them up. It looks like AbstractJob should be doing this. GenericOptionsParser handles a bunch of other handy arguments, the javadoc covers it. It should work without having to do a bunch of argument parsing. If you'd like I can take a closer look this evening.
          srowen Sean R. Owen added a comment -

          Hmm, I don't understand that. AbstractJob.class and ItemIDIndexMapper.class are in the same .jar file. There is only one .jar, with both, so both ought to end up selecting the same .jar file. I'd rather specify the Mapper class instead just because maybe someday someone subclasses AbstractJob, puts the implementation in a different jar, and then this line won't work.

          I can set it to AbstractJob.class for now... but still think there's something wrong here.

          About the other arguemnts, yes, I agree there should be an option.
          I was looking to see how Hadoop accepts arguments like "-Dmapred...." on the command line. I can't find it parsing these anywhere. So I don't know this exists.

          So I can now start adding custom parameters for things like this.

          srowen Sean R. Owen added a comment - Hmm, I don't understand that. AbstractJob.class and ItemIDIndexMapper.class are in the same .jar file. There is only one .jar, with both, so both ought to end up selecting the same .jar file. I'd rather specify the Mapper class instead just because maybe someday someone subclasses AbstractJob, puts the implementation in a different jar, and then this line won't work. I can set it to AbstractJob.class for now... but still think there's something wrong here. About the other arguemnts, yes, I agree there should be an option. I was looking to see how Hadoop accepts arguments like "-Dmapred...." on the command line. I can't find it parsing these anywhere. So I don't know this exists. So I can now start adding custom parameters for things like this.
          srowen Sean R. Owen added a comment -

          I'll have to look at the Hadoop source code. I thought it would set the job .jar from the classpath, given the new way I set up the JobConf like in their sample code.

          srowen Sean R. Owen added a comment - I'll have to look at the Hadoop source code. I thought it would set the job .jar from the classpath, given the new way I set up the JobConf like in their sample code.
          huiwenhan Han Hui Wen added a comment -

          it work now.
          need jobConf.setJarByClass(AbstractJob.class); in AbstractJob.java

          run command like:
          hadoop jar mahout-core-0.4-SNAPSHOT.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=HADOOP_REC_tap_tag -Dmapred.reduce.tasks=200 --input /steer/item/in --tempDir /steer/item/temp --output /steer/item/out --numRecommendations 10

          but also has question,

          AbstractJob has four sub-job ,they displayed the same name,
          It hard to know which phase does the job run in .

          huiwenhan Han Hui Wen added a comment - it work now. need jobConf.setJarByClass(AbstractJob.class); in AbstractJob.java run command like: hadoop jar mahout-core-0.4-SNAPSHOT.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=HADOOP_REC_tap_tag -Dmapred.reduce.tasks=200 --input /steer/item/in --tempDir /steer/item/temp --output /steer/item/out --numRecommendations 10 but also has question, AbstractJob has four sub-job ,they displayed the same name, It hard to know which phase does the job run in .
          huiwenhan Han Hui Wen added a comment - - edited

          get following error:

          10/03/31 10:34:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
          10/03/31 10:34:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
          10/03/31 10:34:12 INFO mapred.FileInputFormat: Total input paths to process : 1
          10/03/31 10:34:13 INFO mapred.JobClient: Running job: job_201003221228_0349
          10/03/31 10:34:14 INFO mapred.JobClient: map 0% reduce 0%
          10/03/31 10:34:25 INFO mapred.JobClient: Task Id : attempt_201003221228_0349_m_000000_0, Status : FAILED
          java.lang.RuntimeException: Error in configuring object
          at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
          at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
          at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
          at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354)
          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
          at org.apache.hadoop.mapred.Child.main(Child.java:170)
          Caused by: java.lang.reflect.InvocationTargetException
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
          ... 5 more
          Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper
          at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:840)
          at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:771)
          at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
          ... 10 more
          Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper
          at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:808)
          at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:832)
          ... 12 more
          Caused by: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper
          at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
          at java.security.AccessController.doPrivileged(Native Method)
          at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
          at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
          at java.lang.Class.forName0(Native Method)
          at java.lang.Class.forName(Class.java:247)
          at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761)
          at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:806)
          ... 13 more

          may be need following in AbstractJob :

          jobConf.setJarByClass(AbstractJob.class);

          huiwenhan Han Hui Wen added a comment - - edited get following error: 10/03/31 10:34:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/03/31 10:34:12 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 10/03/31 10:34:12 INFO mapred.FileInputFormat: Total input paths to process : 1 10/03/31 10:34:13 INFO mapred.JobClient: Running job: job_201003221228_0349 10/03/31 10:34:14 INFO mapred.JobClient: map 0% reduce 0% 10/03/31 10:34:25 INFO mapred.JobClient: Task Id : attempt_201003221228_0349_m_000000_0, Status : FAILED java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:354) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:840) at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:771) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 10 more Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:808) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:832) ... 12 more Caused by: java.lang.ClassNotFoundException: org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:761) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:806) ... 13 more may be need following in AbstractJob : jobConf.setJarByClass(AbstractJob.class);
          huiwenhan Han Hui Wen added a comment -

          Very thanks , I try it later.

          huiwenhan Han Hui Wen added a comment - Very thanks , I try it later.
          srowen Sean R. Owen added a comment -

          If you don't mind, try this again? I changed AbstractJob to properly use the Configured class, which may be the key to getting Hadoop to properly parse these environment args for you. I also removed the --jarFile argument, since you should no longer need it with the change is just made.

          I do not know if it works, but would be grateful if you can try it.

          srowen Sean R. Owen added a comment - If you don't mind, try this again? I changed AbstractJob to properly use the Configured class, which may be the key to getting Hadoop to properly parse these environment args for you. I also removed the --jarFile argument, since you should no longer need it with the change is just made. I do not know if it works, but would be grateful if you can try it.
          huiwenhan Han Hui Wen added a comment -

          I run job using following command:

          hadoop org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=HADOOP_REC_tap_tag\ -Dmapred.reduce.tasks=20 --input /steer/item/in --tempDir /steer/item/temp --output /steer/item/out --jarFile mahout-0.4-SNAPSHOT.jar --numRecommendations 10 --usersFile /steer/item/usersFile

          -Dmapred.reduce.tasks=20 seems that does not work,
          also there seems no options related to job name.

          http://hadoop.apache.org/common/docs/current/mapred-default.html
          http://hadoop.apache.org/common/docs/current/core-default.html

          huiwenhan Han Hui Wen added a comment - I run job using following command: hadoop org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.job.name=HADOOP_REC_tap_tag\ -Dmapred.reduce.tasks=20 --input /steer/item/in --tempDir /steer/item/temp --output /steer/item/out --jarFile mahout-0.4-SNAPSHOT.jar --numRecommendations 10 --usersFile /steer/item/usersFile -Dmapred.reduce.tasks=20 seems that does not work, also there seems no options related to job name. http://hadoop.apache.org/common/docs/current/mapred-default.html http://hadoop.apache.org/common/docs/current/core-default.html
          srowen Sean R. Owen added a comment -

          Ah good call. You can set job name, it seems, with 'mapreduce.job.name'? Hui can you try that?

          srowen Sean R. Owen added a comment - Ah good call. You can set job name, it seems, with 'mapreduce.job.name'? Hui can you try that?
          jake.mannix Jake Mannix added a comment -

          Don't the jobs which implement Tool allow for hadoop options to be passed in, so -Dmapred.reduce.tasks=10 should work?

          jake.mannix Jake Mannix added a comment - Don't the jobs which implement Tool allow for hadoop options to be passed in, so -Dmapred.reduce.tasks=10 should work?
          srowen Sean R. Owen added a comment -

          I can add options for both of those, yes. I am not sure why Hadoop defaults to one reducer when even the docs suggest a better default size. The number of mappers is chosen intelligently.

          While I'm at this, I'd like to take some time to reorganize how AbstractJob works to refactor some of this code. I think we can reduce duplication even as I add more options here. For example I'm not sure if we still need DefaultOptionBuilder? I'll see what needs to stay there.

          srowen Sean R. Owen added a comment - I can add options for both of those, yes. I am not sure why Hadoop defaults to one reducer when even the docs suggest a better default size. The number of mappers is chosen intelligently. While I'm at this, I'd like to take some time to reorganize how AbstractJob works to refactor some of this code. I think we can reduce duplication even as I add more options here. For example I'm not sure if we still need DefaultOptionBuilder? I'll see what needs to stay there.
          huiwenhan Han Hui Wen made changes -
          Attachment screenshot-1.jpg [ 12440072 ]
          huiwenhan Han Hui Wen made changes -
          Field Original Value New Value
          Description Can add one "JobName" parameter to org.apache.mahout.cf.taste.hadoop.item.RecommenderJob?
          if there's a lot of RecommenderJob,it's hard to distinguish those jobs.

          also RecommenderJob has four sub jobs (or phase ) ,can add sub-job name to those phase ?
          Can add one "JobName" parameter to org.apache.mahout.cf.taste.hadoop.item.RecommenderJob?
          if there's a lot of RecommenderJob,it's hard to distinguish those jobs.

          also RecommenderJob has four sub jobs (or phase ) ,can add sub-job name to those phase ?

          Because RecommenderJob has not setNumReduceTasks ,it seems that the performance is not good in reduce phase.
          Summary add one "JobName" parameter to org.apache.mahout.cf.taste.hadoop.item.RecommenderJob add one "JobName" and reduceNumber parameter to org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
          huiwenhan Han Hui Wen created issue -

          People

            srowen Sean R. Owen
            huiwenhan Han Hui Wen
            Votes:
            0 Vote for this issue
            Watchers:
            Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack