Description
See https://github.com/apache/nutch/blob/trunk/src/java/org/apache/nutch/crawl/Generator.java#L542
'mapred.job.tracker' is deprecated and has been replaced by 'mapreduce.jobtracker.address', however when running Nutch on EMR mapreduce.jobtracker.address has local as a value. As a result we generate a single partition i.e. have a single map fetching later on (which defeats the object of having a distributed crawler).
We should probably detect whether we are running on YARN instead, see http://stackoverflow.com/questions/29680155/why-there-is-a-mapreduce-jobtracker-address-configuration-on-yarn