Description
When a crawl is done using the 'bin/nutch crawl' command, no filtering is done in Generator even if 'crawl.generate.filter' is set to true in the configuration file.
The problem is that in the Generator's generate method, the following code unconditionally sets the filter value of the job to whatever is passed to it:-
job.setBoolean(CRAWL_GENERATE_FILTER, filter);
The code in Crawl.java always passes this as false.
This has been fixed by exposing an overloaded generate method which takes only the 5 arguments that Crawl needs to set. This overloaded method reads the configuration and sets the filter value appropriately.